In this Blog Post, I'll try to give you an idea of how ChatGPT works. In later posts we can talk about implications and ethics.
What is ChatGPT? It is an Artificial Intelligence (AI) application "specifically designed to generate human-like text responses." (so says ChatGPT). ChatGPT 4 is the latest released version, 3.5 is the latest free version.
Have you tried ChatGPT? If not, it has a very simple User Interface. You enter a question, or a phrase, or several pages of text, and it responds.
In this example, ChatGPT responded with -
"The best thing about AI is its ability to perform tasks and solve problems that would be challenging or impossible for humans to do quickly or accurately. AI can analyze vast amounts of data, make predictions, automate repetitive tasks, and even learn from new information to continuously improve its performance."
Pretty incredible, right? How does it do it? Let's find out.
Caveats - We'll be simplifying a lot. All you AI experts - if I have missed a big point or oversimplified too much, please tell me! email@example.com.
*Most examples are taken from this great article by Stephen Wolfram, an icon of data science. It's a fantastic article. If you have some background in AI/Math/Computer Science I highly recommend it.
What is Artificial Intelligence?
I found the definitions of AI (and related concepts) are changing and not always consistent. Here's how current definitions seems to be coaslescing:
AI - AI is a broad concept of machines/systems doing smart things. It includes many applications like physical robots, chatbots, self-driving cars, text to speech, speech to text, and text and image comprehension and generation.
ML – Machine Learning is now being defined as the underlying Models/Algorithms used in AI that can learn over time. These are predominantly Complex Neural Networks (ie - Deep Learning Networks).
GPT - (Generative Pre-Trained Transformers) is the type of neural networks used by ChatGPT.
The Math behind the Machine
We'll talk about the Machine Learning aspects, focusing on the most basic concepts of how GPT-3 works. (Source code for later versions is not available.)
There are 4 steps that happen when you use GPT for text generation. I'll try to give you an idea of each.
1 - Transform inputs (words, questions, websites, articles, etc) into a set of numbers
2 – Feed numbers (representing words and phrases) through Deep Learning models
3 – Output predictions for the next word
4 – Add selected word to phrase and repeat
1 - Transform words into numbers (called embeddings).
This clever mechanism is best described with an example. Here words are transformed into two dimensions. (Think of cities using lat/long identifiers). This allows calculation of distances between words, just as you would calculate distances between cities on a map. So we can see that "cat" and "dog" are much closer than "cat" and "turnip".
GPT does this, but with 12,000 dimensions, which allows for a lot of nuance and sophistication, but no possible way for humans to understand it.
(Ok, so really GPT uses tokens - small words like cat, or pieces of long words, face + book = Facebook...let's ignore that for now.)
2 - Feed words (as numbers) into Deep learning models. (What the heck does that mean?)
You can think of Neural Network Models as a series of fairly simple mathematical calculations, where the calculations are linked together (the output of one or more = input to the next).
We represent them as a diagram like this, where the circles (nodes) show where calculations take place, and the arrows (edges) show which calculations are connected (and how strongly). The final output is the answer.
These models "learn" to minimize error by referencing a "training" dataset. GPT 3 used 570 GB of Training Data- mostly Web Sites, Books, Articles. The final model is then really just a series of optimized numerical equations and data transformations (with many thousands of equations and billions of parameters). Once this dataset is used, it is discarded, and only the equations remain.
3 - Output predictions for the next word. For Example - Type into ChatGPT the following phrase:
“The best thing about AI is its ability to...”
Try it! I did it 3 times (in three different sessions) and got 3 different next words.
“The best thing about AI is its ability to...perform
“The best thing about AI is its ability to...learn
“The best thing about AI is its ability to...efficiently
Why different words? The model outputs all possible words (using a reverse sort of embedding to go from numbers to words), along with a probability that the next word is "correct." That comes from similar references it found in the 570GB of training data. The model selects the next word based upon these probabilities - in our example here are the top selections- so it does not always pick the top one. But you also probably won't get a totally wrong word, like "whale" or "puke."
These probabilities mean you will likely never get the same exact answer twice.
4 - Add selected word to phrase and repeat.
ChatGPT outputs not just one word, but phrases, outlines, lists, and entire pages of text. But here's the thing, it does it ONE word (token) at a time. For the below example, the model runs its calculations four times, to output three words, plus one period.
The best thing about AI is its ability to...
The best thing about AI is its ability to learn
The best thing about AI is its ability to learn over
The best thing about AI is its ability to learn over time
The best thing about AI is its ability to learn over time.
Here is one actual output from ChatGPT:
The best thing about AI is its ability to ... perform tasks and solve problems that would be challenging or impossible for humans to do quickly or accurately. AI can analyze vast amounts of data, make predictions, automate repetitive tasks, and even learn from new information to continuously improve its performance.
Each word the result of many thousands if not millions of calculations. Let that sink in a moment.
1 - GPT Makes up Answers. You may hear people worry about GPT making up answers. You've just seen that GPT does not use actual words or articles when it answers your questions. It uses equations and probabilities. So ALL answers are made up.
2 - GPT Answers can be Biased. ChatGPT outputs are proportional to their input. So it can be hard to get new answers that break old patterns. This is why there are so many examples of failed AI programs. (Example - Amazon's secret AI recruiting tool actively discriminated against women, exactly what they didn't want.)
3 - No one knows how this works. Humans defined the configuration of the system (the number and placement of nodes and types of transformations), and the system determines the parameters values (the learning). Aside from the staggering number of calculations, The human created configuration is just trial and error; it's not based upon theory or rigorous mathematics. Things that work are expanded upon year over year, with the models growing incrementally better (and often larger). But no one knows why.
I hope you enjoyed this intro to the numbers behind the words of ChatGPT. While it might feel scary that a black box can mimic human thinking so well, I think a lot of computational linguistics researchers must be very excited right now about the implications into human thought, since neural networks are indeed modeled after our brains.
What do you think? Drop me your thoughts via email (linda@lmwhitaker,com) or Facebook.