Deep Learning’s Zero and Infinity Errors

 Neural Networks hallucinate a mistaken reality

Our Cats

“meanings just ain’t in the head” — Hillary Putnam

You are driving your Tesla Model S down a desert route in the United States. There is no other car or pedestrian in sight. You decide to switch the autopilot as your focus shifts between the road and your partner seated next to you. At the same time, a Space X rocket is safely on the landing trajectory back to earth. All is good!

Suddenly a tiny meteorite heading for the earth at just the right trajectory hits the rocket. It knocks it off course and damages its thrusters. Only the force of gravity and the wind resistance are in control of the rocket now. It crash-lands a few feet in front of your car. Tesla’s autopilot doesn’t swerve, and it doesn’t stop. It goes straight through killing you, your partner, and destroying the vehicle and its A.I. computer.

What happened? The car hallucinated that the rocket was a plastic bag. Such a case or a similar one never occurred when the object detection neural network was trained on the various edge cases that might arise when driving.

Donnie Darko and the airplane engine

Donnie Darko-ish? Maybe. However, the more common case of hallucinating that cars are not out there, or people for that matter, are plenty. They are scarce in probabilistic terms but flip the coin often and it will happen enough to cause deaths (and lawsuits). This author was focusing on the screen of a Tesla driven by an Uber driver and noticed that it missed detecting one car while stopping at a red light. Tesla has a large display that shows all the vehicles the network detects as 3D models. It shows people too. But it failed to see one car passing by, and that was during a 15-minute trip. Fortunately, a person was behind the steering wheel. (Of course, the issue is not restricted to Tesla per se, but exists in any self-driving car that relies only on cameras and deep learning for image recognition).

The invention of zero

To understand the hallucinatory problem in deep learning, we will make an analogy to the world of Math. The most significant innovation in Mathematics happened around 3 B.C.E., when the number 0 was invented in India. Another less impactful leap occurred in 1655 A.C.E. when John Wallis invented the symbol ∞ to represent infinity. These two symbols are cornerstones of most mathematics, and Calculus would be impossible without them.

While intuitively, we can understand that infinity is a type of symbol and not a number — there is no such thing as the largest number or infinity — it is harder to see that with zero. After all, nothing is zero. But zero is a metaphor for “Lack-off.” Not having any cats means you have zero cats, and not having any money means you have zero money. This innovation allowed for the Arab numeral, decimal, and binary systems, all by being able to symbolize “lack.”

This type of symbolic language got lost in the age of Deep Learning. Vectors of numbers are believed to be the realm of thought, as the famous Toronto institute founded by AI pioneer Geoff Hinton would say. And indeed, there has been good evidence that the human brain stores vectors of numbers in neurons which are the weights in neural networks.

Yet this pattern recognition mechanism — what the psychologist Daniel Kahneman would call system-1 thinking — is not the distinguishing feature of the wise man. Instead, the ability to overcome the automatic pattern recognitions and reason about the world distinguishes people from the rest of the animals.

Yet even in pattern recognition, deep learning has some severe flaws. Namely, it will hallucinate results and answer questions when the right thing to say is “I don’t know.” A person who sees some weird impressionist painting and then is asked to identify what it shows usually would answer by “I don’t know.” Not so a neural network, which generally tries to answer as it must.

System-1 and system-2 thinking

I identify here two types of errors, that I will call type-Zero and type-Infinity. I will also show why they need to be overcome for deep learning to be used in critical applications such as self-driving cars.

The first type is an error where the neural network holds the correct answer in its weights but doesn’t always return it. For instance, when I ask OpenAI’s GPT-3 (a sophisticated natural language processing neural network), “What model of car did Napoleon Bonaparte own?”. In one case, it says, “A Citroën.” In another, it gives a right-ish answer: “A car did not exist during Napoleon’s time!” I will refer to this answer as a Type-Zero answer.

An answer of “I don’t understand” could have also been adequate. But the neural network had no way of understanding what should pass as a genuine question and what is a trick question it needs not answer.

Cats everywhere

To understand this better, we consider a cat/not-a-cat image classifier. To train such a neural network model today requires having many cat images, as well as a large number of pictures of objects that are not cats: dogs, people, houses, laptops, etc. The hope is that the neural network will learn the actual distribution of everything ever, in one category and cats in the other. Presented with the image of a yet-to-be-imagined alien, it should say that this is not a cat.

But usually, that’s not what happens. If you present the cat/not-a-cat classifier with the image of something far from the distribution it was trained on, it will hallucinate an answer. It could be a cat; it could be not-a-cat, and it could be equal probabilities for cat and not-a-cat.

Type-Zero here refers to a particular output for the network where anything that is not recognized would fall under. Training the network on the infinite set of everything it could see is impossible. We need a new way for networks to understand a question that should result in a zero output.

The same applies to language models. When faced with a trick question, the network should answer with “I don’t know” and leave it to its trainers to reinforce the correct answer. For instance, a language model could be trained on texts that present the chronological order of inventions and people’s lives.

We now move to the type-infinity errors. Say I feed GPT-3 a gibberish question: “How can Elon Gates generate the volume of mass?”. Here I get a gibberish-type answer, as follows:

There is no definitive answer to this question as it depends on several factors, including the specific goals and objectives of Gates’ business ventures. However, some potential methods for generating mass could include creating new products or services that appeal to a wide range of consumers, investing in marketing and advertising to reach a larger audience, or partnering with other businesses to expand one’s reach.

Here also, an answer of “I don’t understand” would have sufficed. Again, to simplify, we consider a cat and dog image classifier. Here the network can only output two possibilities: Cat or Dog. Present it with a bird or anything else; the network will hallucinate. The difference here is that the input is retracted to a set of two elements in the case of images. The language set here is cat vs. dog, just like the expected language set of GPT-3 is the English language. Anything that is out of this set, should be treated as the infinite generation power of objects out there. In the case of GPT-3, this would amount to the generation powers of human language. The proper answer is outside the network’s possible responses in this case. The input set that generates the output “I don’t understand” is infinite. But so is the input set that would generate a correct answer due to the unlimited generative power of human language!

While for type-Zero errors, we could enumerate and reinforce the correct answer by giving a proper input that would map to an answer. For type-infinite errors, we can’t feed the network all the examples of Gibberish out there. A way to think of it is that in one case, we have a mapping from the integers to another infinitely countable set, while in the second, we are dealing with the set of real numbers.

Note that when it comes to natural language processing, this only applies to sentences made up of English words. GPT-3 doesn’t recognize nonword sentences, e.g., “xxsdfj dafd kjasdfd”, because there are only so many words in the English language, and GPT-3’s pretraining learns word embeddings.

Pattern recognition and symbolic reasoning

Therefore, we go full circle. The zero and infinity that neural networks need to symbolize so that neural networks are used in critical applications in the world. Pattern recognition alone wouldn’t have stopped the car in the opening of this piece. We will need to have a default zero state and a default infinity state upon which the network will respond with “Stop the car” or “danger!”.

Postscript

Special Thanks to Anamaria Monda for reviewing a draft of this piece as well as for the picture of the cats!

GPT-3 Contributed the following alternate ending to the article:

“The network will need to hallucinate less and resort to symbolic reasoning more. The way to think of it is that we have a function that maps from the integers to the reals. The integers are what we have trained the network on, and the reals are everything else. The network will need to hallucinate less and resort to symbolic reasoning more. The way to think of it is that we have a function that maps from the integers to the reals. The integers are what we have trained the network on, and the reals are everything else. The network will need to learn to separate the two and respond accordingly. We need to find a way for the network to understand when it should answer “I don’t know.” The current state of the art falls short in this regard.”

In bold is a mathematical impossibility. It seems that the network was tricked by the following sentence in the essay:

“A way to think of it is that in one case, we have a mapping from the integers to another infinitely countable set, while in the second, we are dealing with the set of real numbers.”

Which opens up a can of worms about whether language is countable or uncountable.

Comments

Popular posts from this blog

Beyond the Gaps of Weak AI: Deep Learning as the Path to Artificial General Intelligence

The Pincer after the North American Programmer’s Job

SuperIntelligence: A book Review