Sydney and Me

Kevin Zatloukal
8 min readFeb 17, 2023

It seems to me that each “AI breakthrough” comes with an equally important revelation about human beings and human behavior.

Typically, machine learning researchers will toil away at improving the measured accuracy of their models, increasing it from 85% to 90%, 91%, 92%, all the while, the human users continue to laugh at its occasional mistakes. But then, suddenly, when it reaches some particular level of accuracy, say 94%, the human users respond “Wow, this is like magic! It feels like a human is answering these questions.”

As far as I know, there was no fundamental, mathematical difference between, say, 92% and 94%, but to human perception, that was the critical line to cross. That the human reaction would change so drastically between those two levels of accuracy is new information, something we learned about human beings, about ourselves, that we did not know before.

With that background, I find it interesting to think about what percentage of the “breakthrough” in ChatGPT is what we are learning about ourselves rather than about computers, and I think the answer is: nearly all of it. One of the inventors of deep learning, Yann LeCun, noted that, technologically, there is really nothing new in ChatGPT. However, the way that the technology was packaged and presented to users has made an incredible difference in the human reaction to the technology. Despite how dramatic that response has been, early indications are we are just at the beginning of seeing the results of this amazing experiment in human behavior.

The Attention Mechanism

Technologically, the key breakthrough that enabled ChatGPT was published 5 years ago in Google’s paper “Attention Is All You Need”. Interestingly, the authors of the paper were not working on creating a chat bot, but rather on the problem of language translation.

Prior to Google’s paper, the standard techniques (as far as I know) used the previous N words of the first language (for some value of N) in order to predict the next word of the second language. Here is a simple example. Suppose we are translating the word “it” in the following sentence from English to Spanish:

A model with N=7 words of context would have all of the blue underlined text available to it. This context includes the word “she”, so the model might guess that next word would be feminine pronoun “la”. However, as you can see, that is not the correct answer. In this case, “it” refers to the “couch”, which has a masculine pronoun in Spanish. (You might think the model could guess “el” because of the word “sit”, but you are most likely to be sitting on a chair, which is also “la” in Spanish.)

Back in the day, models were made more powerful by increasing N. This came at an exponentially increased cost in terms of the model, and even then, it would not always solve the problem. We could easily insert more text in between “couch” and “it”, pushing them further apart, until the model no longer had that word in its context.

Google’s innovation was to have the model learn what words are important based on the word it is looking at, to learn what words to pay attention to. Naively, we might think that the N closest words are most important, but for words like “it”, the model would learn that words farther back actually matter more.

Google’s “attention” mechanism would compare the current word “it” to each prior word and do a calculation involving the two words and how far apart they are to calculate a score for how important that word is. Then, it would pick out the highest scoring word as the most important.

The weights involved in that calculation are what the model learns: it figures out the best way to compare words to determine how important the second word is to translating the first word. For most words, it will put probably a lot of weight on how far apart the words are, so you end up paying attention to nearby words, but for other words like “it”, it will put more weight on how likely the word is to be a subject.

Google called the part of the model that does this calculation an “attention head”. It looks over the prior words and picks out one as important. Google’s model actually includes many heads of attention. If it had, say, 4 heads, then we would do 4 slightly different calculations and end up with 4 words of context to use in guessing the next Spanish word.

Crucially, those words would not need to be next to one another. In our example, we might end up with these words:

If the model learns properly, it will end up including “couch” in the context rather than “she”, and now it will correct guess that the next word is the masculine “el” rather than the feminine “la”.

In practice, the attention mechanism allowed Google’s machine translation system to achieve higher accuracy with less context (a smaller model). It was a substantial machine learning breakthrough.

The Technology of ChatGPT Is a Parlor Trick

ChatGPT’s model is not built for translation but rather for “generation”, guessing the next word in a sequence of text that appears on the internet, but attention is also at the core of ChatGPT’s model. The key to producing human-like text is knowing what prior words are most important to pay attention to (to have in the context) when picking the next word.

I maybe be wrong about details, as these are not systems I have worked on personally (I have only read about them), but I believe this is roughly how it works: each time you want to guess the next word, the model does a hundred or so attention calculations to pick out the hundred or so words in the earlier text that are most relevant to the word that comes next; it guesses the probability of each possible word appearing next in the text, based on that context of a hundred words; and then the ChatGPT system picks one of them (often, but not always, one of the most probable ones).

It is truly amazing how well this works. It turns out that knowing the right hundred or so words to pick out of the text allows you to guess extremely reasonable choices for what words should appear next. And if you don’t choose the most probable words, but slightly less probable ones, you generate text that is interesting, or even thought-provoking, to humans.

I find it useful, when trying not to be shocked at what ChatGPT can do, to try to keep in mind how it actually works. Usually, when I sit down and ask myself, “could a program guess that word, there, if it knew the right prior words to pay attention to?” the answer is usually “yes”. That makes it a lot easier to understand that a computer program should be able to do this.

For example, this helps me understand how ChatGPT could output a “Hello, World” program in Python, but with “World” replaced by my name, if I ask it to. If I didn’t ask it to put my name in there, I would just be asking it to print out a “Hello, World” program, of which it has undoubtedly seen many examples in internet text. It seems harder to understand how it could know how to insert my name into that program. However, if the context provided by the attention mechanism includes “replace”, “World”, my name, and the most recent word “Hello,” and the system was trained on examples that replaced words in existing text, then it does seem believable that the appropriate probability distribution would give my name high probability of being the next word.

I have also found this mental model useful in understanding some other aspects of ChatGPT’s behavior, in particular, the “jail breaks” that allow it to say things that OpenAI and Microsoft do not want it to.

As I understand it (and once again, I could be wrong since I didn’t build this myself), one of the ways that ChatGPT / Bing Chat (code name “Sydney”) was engineered to avoid saying awful things is by including some hidden instructions in the prompt before the text typed by the user. Some words of those instructions are picked up by the attention heads and show up in the context when the system is choosing the next word. That context allowed the engineers to adjust the probability distribution for the next word away from offensive and hurtful words (you know, internet text).

But users have found that the right sequence of prompts can cause the system to revert back to its original behavior. This seems rather strange if you know nothing about how the system works, but thinking about attention, it makes sense. If you can find the right things to say to cause the system to focus its attention on all of your words and none of OpenAI / Microsoft’s instructions, then you get mostly the original probability distribution, which generates normal (read: offensive) internet text.

Indeed, while things are always obvious in hindsight, this feels like the behavior you would expect from a system based on (limited) attention. If you can distract it and get its attention focused in the right places, then you can make it do what you want. (And humans are really good at distracting people by focusing their attention elsewhere.)

Human Response to ChatGPT Is Next Level

We got inklings, last year, of what was to come when one of Google’s employees working on their chat bot said that they believed the machine had become sentient. (They were subsequently fired.)

When ChatGPT was released, many of the responses were similarly dramatic. I saw many on Twitter proclaiming that “it’s over” for humans. One example:

This week, I saw news stories about early interactions with Bing Chat leaving users “shook” and “unsettled”:

Eventually, the reactions will calm down, but I suspect we have a long ways to go. We may know exactly how ChatGPT works, but we have a lot left to learn about how humans work, and the ChatGPT experiment seems to have a lot teach us on that front.

I am loathe to make predictions about where this will go, but I find myself strongly agreeing with Ben Thompson that the implications of large language models for search are being overestimated, while those for other applications are being underestimated. Chat bots in particular induce a large, often emotional reaction from humans, much more-so than better search results. That strong human reaction seems like a clue that chat bots are actually the interesting part, the part with the most to teach us about ourselves.

--

--