Why AI's Make Certain Kinds of Mistakes

There are so many who rave about AI, and its many forms like ChatGPT. Yes, the output of many of these AI’s is very impressive. And yes, they hallucinate too (cook up facts). All that’s well known.

 

What’s less well known is that many AI’s make mistakes with these two simple questions. The first one: Which is bigger? 9.2 or 9.11? The other one is just as simple: How many r’s does the word “strawberry” have?

 

Believe it or not, a lot of AI’s get those two questions wrong! What is going on? As you know, the AI’s can (on many topics) explain their reasoning. So they were asked to explain how they come to the wrong answers.

 

On the 9.2 v/s 9.11 question, the AI’s say there are multiple interpretational patterns to evaluate the question. The maths way is just one of them. The maths way leads to the obvious answer (9.2 is the same as 9.20; and so 9.20 or 9.2 is bigger). But there are other ways to look at the question. One such way is to read them as “2” and “11” respectively. In that way of doing things, 11 is bigger than 2 and so 9.11 is bigger than 9.2. Which interpretation should the AI pick? That last step (which interpretation decides?) is where the AI error arises:

“It’s not that “I can’t do the math” – it’s that the mathematical interpretation is one pattern among many, and not always the dominant one.”

This is almost certainly why most kids get it wrong in the beginning, at least until they get enough practice. And even then, only in the context of maths classes and tests! Ask them again in real life, with no context, and many will again be wrong.

 

Onto the r’s in “strawberry” question. This error, as the AI explains its “thought” process, is rooted in the fact that AI’s don’t break words into letters – why would they? Meanings lie in the entire word or sentence, not individual letters. So the word “strawberry” is treated as a “unified semantic unit”, and “not naturally decomposed into individual letters”. Therefore, when asked the number of r’s, the AI’s architecture doesn’t have any way to know the answer. Instead, it turns to what it “thinks” are the best or most relevant indirect ways within its architecture. And it turns out none of those indirect ways lead to the right answer!

 

When I saw the reasons, I realize the AI’s are becoming like us humans, are thinking like us and are making the kinds of mistakes humans could make with those questions. Should we be worried? How often do they make these kinds of mistakes? Would we even notice? On the other hand, if they make errors like this, are they really good enough to take over so many jobs?

Comments

Popular posts from this blog

The Thrill of the Chase

Chess is too Boring

Europe #3 - Innsbruck