Why AI's Make Certain Kinds of Mistakes
There are so many who rave about AI, and its many forms like ChatGPT. Yes, the output of many of these AI’s is very impressive. And yes, they hallucinate too (cook up facts). All that’s well known.
What’s less well
known is that many AI’s make mistakes with these two simple questions. The
first one: Which is bigger? 9.2 or 9.11? The other one is just as
simple: How many r’s does the word “strawberry” have?
Believe it or not,
a lot of AI’s get those two questions wrong! What is going on? As you
know, the AI’s can (on many topics) explain their reasoning. So they were asked
to explain how they come to the wrong answers.
On the 9.2 v/s
9.11 question, the AI’s say there are multiple interpretational patterns
to evaluate the question. The maths way is just one of them. The maths
way leads to the obvious answer (9.2 is the same as 9.20; and so 9.20 or 9.2 is
bigger). But there are other ways to look at the question. One such way is to
read them as “2” and “11” respectively. In that way of doing things, 11 is
bigger than 2 and so 9.11 is bigger than 9.2. Which interpretation should the
AI pick? That last step (which interpretation decides?) is where the AI error
arises:
“It’s
not that “I can’t do the math” – it’s that the mathematical interpretation is
one pattern among many, and not always the dominant one.”
This is almost
certainly why most kids get it wrong in the beginning, at least until they get
enough practice. And even then, only in the context of maths classes and tests!
Ask them again in real life, with no context, and many will again be wrong.
Onto the r’s in
“strawberry” question. This error, as the AI explains its “thought” process, is
rooted in the fact that AI’s don’t break words into letters – why would they?
Meanings lie in the entire word or sentence, not individual letters. So the
word “strawberry” is treated as a “unified semantic unit”, and “not naturally
decomposed into individual letters”. Therefore, when asked the number of r’s,
the AI’s architecture doesn’t have any way to know the answer. Instead, it
turns to what it “thinks” are the best or most relevant indirect ways within
its architecture. And it turns out none of those indirect ways lead to the
right answer!
When I saw the reasons, I realize the AI’s are becoming like us humans, are thinking like us and are making the kinds of mistakes humans could make with those questions. Should we be worried? How often do they make these kinds of mistakes? Would we even notice? On the other hand, if they make errors like this, are they really good enough to take over so many jobs?
Comments
Post a Comment