Generative AI and Neural Net Fundamentals
How Does Generative AI Work?
There are two answers to the question of how generative AI models work. Empirically, we know how they work in detail because humans designed their various neural network implementations to do exactly what they do, iterating those designs over decades to make them better and better. AI developers know exactly how the neurons are connected; they engineered each model’s training process. Yet, in practice, no one knows exactly how generative AI models do what they do—that’s the embarrassing truth.
We don’t know how they do the actual creative task because what goes on inside the neural network layers is way too complex for us to decipher, at least today,
said Dean Thompson, a former chief technology officer of multiple AI startups that have been acquired over the years by companies, including LinkedIn and Yelp, where he remains as a senior software engineer working on large language models (LLMs). Generative AI’s ability to produce new original content appears to be an emergent property of what is known, that is, their structure and training. So while there is plenty to explain vis-a-vis what we know, what a model such as GPT-3.5 is actually doing internally—what it’s thinking, if you will—has yet to be figured out. Some AI researchers are confident that this will become known in the next five to 10 years; others are unsure it will ever be fully understood.
Here’s an overview of what we do know about how generative AI works:
Start with the brain. A good place to start in understanding generative AI models is with the human brain, according to Jeff Hawkins’ 2004 book, “On Intelligence.” Hawkins, a computer scientist, brain scientist, and entrepreneur, presented his work in a 2005 session at PC Forum, which was an annual conference of leading technology executives led by tech investor Esther Dyson. Hawkins hypothesized that, at the neuron level, the brain works by continuously predicting what’s going to happen next and then learning from the differences between its predictions and subsequent reality. To improve its predictive ability, the brain builds an internal representation of the world. In his theory, human intelligence emerges from that process. Whether influenced by Hawkins or not, generative AI works exactly this way. And, startlingly, it acts as if it is intelligent.
Build an artificial neural network. All generative AI models begin with an artificial neural network encoded in software. Thompson says a good visual metaphor for a neural network is to imagine the familiar spreadsheet, but in three dimensions because the artificial neurons are stacked in layers, similar to how real neurons are stacked in the brain. AI researchers even call each neuron a “cell,” Thompson notes, and each cell contains a formula relating it to other cells in the network—mimicking the way that the connections between brain neurons have different strengths.
Each layer may have tens, hundreds, or thousands of artificial neurons, but the number of neurons is not what AI researchers focus on. Instead, they measure models by the number of connections between neurons. The strengths of these connections vary based on their cell equations’ coefficients, which are more generally called “weights” or “parameters.” These connection-defining coefficients are what’s being referred to when you read, for example, that the GPT-3 model has 175 billion parameters. The latest version, GPT-4, is rumored to have trillions of parameters, though that is unconfirmed. There are a handful of neural network architectures with differing characteristics that lend themselves to producing content in a particular modality; the transformer architecture appears to be best for large language models, for example.
Teach the newborn neural network model. Large language models are given enormous volumes of text to process and are tasked with making simple predictions, such as the next word in a sequence or the correct order of a set of sentences. In practice, though, neural network models work in units called tokens, not words.A common word may have its own token, uncommon words would certainly be made up of multiple tokens, and some tokens may just be a single space followed by ‘th’ because that sequence of three characters is so common,
said Thompson. To make each prediction, the model inputs a token at the bottom layer of a particular stack of artificial neurons; that layer processes it and passes its output to the next layer, which processes and passes on its output, and so on until the final output emerges from the top of the stack. Stack sizes can vary significantly, but they’re generally on the order of tens of layers, not thousands or millions.
In the early training stages, the model’s predictions aren’t very good. But each time the model predicts a token, it checks for correctness against the training data. Whether it’s right or wrong, a “backpropagation” algorithm adjusts the parameters—that is, the formulas’ coefficients—in each cell of the stack that made that prediction. The goal of the adjustments is to make the correct prediction more probableIt does this for right answers, too, because that right prediction may have only had, say, a 30% certainty, but that 30% was the most of all the other possible answers,
Thompson said. So, backpropagation seeks to turn that 30% into 30.001%, or something like that.
After the model has repeated this process for trillions of text tokens, it becomes very good at predicting the next token, or word. After initial training, generative AI models can be fine-tuned via a supervised learning technique, such as reinforcement learning from human feedback (RLHF). In RLHF, the model’s output is given to human reviewers who make a binary positive or negative assessment—thumbs up or down—which is fed back to the model. RLHF was used to fine-tune OpenAI’s GPT 3.5 model to help create the ChatGPT chatbot that went viral.
But how did the model answer my question? It’s a mystery. Here’s how Thompson explains the current state of understanding: There’s a huge ‘we just don’t know’ in the middle of my explanation. What we know is that it takes your entire question as a sequence of tokens, and at the first layer processes all of those simultaneously. And we know it then processes the outputs from that first layer in the next layer, and so on up the stack. And then we know that it uses that top layer to predict, which is to say, produce a first token, and that first token is represented as a given in that whole system to produce the next token, and so on.
The logical next question is, what did it think about, and how, in all that processing? What did all those layers do? And the stark answer is, we don’t know. We … do … not … know. You can study it. You can observe it. But it’s complex beyond our ability to analyze. It’s just like F-MRI [functional magnetic resonance imaging] on people’s brains. It’s the crudest sketch of what the model actually did. We don’t know.
Although it’s controversial, a group of more than a dozen researchers who had early access to GPT-4 in fall 2022 concluded that the intelligence with which the model responds to complex challenges they posed to it, as well as the broad range of expertise it exhibits, indicates that GPT-4 has attained a form of general intelligence. In other words, it has built up an internal model of how the world works, just as a human brain might, and it uses that model to reason through the questions put to it. One of the researchers told This American Life
that he had a holy s---
moment when he asked GPT-4 to, Give me a chocolate chip cookie recipe, but written in the style of a very depressed person,
and the model responded: Ingredients: 1 cup butter softened, if you can even find the energy to soften it. 1 teaspoon vanilla extract, the fake artificial flavor of happiness. 1 cup semi-sweet chocolate chips, tiny little joys that will eventually just melt away.
Want to learn even more about genAI? Check out the full series on this hot topic here: https://www.oracle.com/artificial-intelligence/generative-ai/what-is-generative-ai/.