MINDWORKS

Mini: Rise of the LLMs

Daniel Serfaty Season 5

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 9:33

In this MINDWORKS Mini, join host Daniel Serfaty as he talks with Drs. Svitlana Volkova and Robert McCormack about the birth of Large Language Models and how we got to where we are today. 

Listen to the full episode, AI: The End of the Prologue on Apple, Spotify, and wherever you get your podcasts. 

Daniel Serfaty: I want to take stock about what have we accomplished today. AI was not created with the distribution of ChatGPT a couple of years ago, and basically that it became available for free on everybody's platform or almost free. But AI started a long time ago. In fact, a few people know that AI started actually in the fifties and eventually went through a revolution, followed by a winter, followed by a revolution, followed by a winter, and now we have perhaps in the big third wave of AI. This one is different from the previous ones, I think. Because so many people touch it, it's not just a prerogative or the privilege of a few scientists, but it's really becoming democratized almost.

So from your perspective, or if we look more in the short term, was there a seminal event, an idea, an insight, a paper that really predated the explosion of AI or enabled the explosion of AI in so many aspects today? Was there one particular thing that you want to share with our audience that really enabled what most people consider AI today, which is primarily large language models?

Robert (Bob) McCormack: When I think about the latest wave of AI, my mind always goes back to around 2017. There was a paper about transformer models. Transformer models really built off a lot of the work from the previous decades on neural networks and added a lot of new functionality that I think really revolutionized how AI is used today. And a lot of that has to do with the idea of attention. So neural networks are really good about taking large amounts of data and finding patterns in that. But what's really interesting about language is just how easy it is for us humans to speak to each other, to understand each other. Two, three-year-olds can use language and effectively communicate with each other and with their parents and adults. But that's really been hard for computers. With the advent of transformers almost a decade ago, introduced some new concepts into the neural network architecture, one of them being the concept of attention, and that's really important in understanding language.

The notion is that every single word in a sentence has meaning, but that meaning is not independent of the rest of the sentence. The sentence as a whole adds meaning to every individual word. So for example, if I say, she picked up the bat after the baseball game. Your mind immediately goes to a wooden baseball bat or a metal baseball bat. But if I say she picked up the bat at the zoo, your mind immediately goes to a very different meaning of bat, the flying animal. And so, humans intuitively understand and are able to parse that and really place the differences in those meanings of the word bat. But that's traditionally been very difficult for computers to do. And the transformer architecture really provided some mechanisms to understand in context what these words mean, how they're different, and allow computers to really think at a higher level and transform the way information and language is parsed.

Daniel Serfaty: So, some ideas you said for you from that paper, the 2017 paper on transformers, Svitlana, maybe you can bring us back more to the past two years. And see maybe based on that technology, how do you see the big breakthroughs over the past two years, what changed there?

Svitlana Volkova: Absolutely. Bob is absolutely right with the transformer architectures, which is a very efficient architecture that allowed for us to develop LLMs that can learn at scale. If we think about what knowledge is encoded in the LLM, it would be like human learning for 200 years, which is not possible. So if we're thinking about the advancements and this pivotal moments for the last two years, I would say, definitely transformer architectures and the ability to learn at scale from large amount of data. In addition to that, I would say, the RLHF techniques, the reinforcement learning with human feedback. I think I might have mentioned this experiment that OpenAI did when they released ChatGPT to the world. They basically collected all of the human preferences and interactions of how people across the world interact with ChatGPT. To then use these preferences and interactions and improve the model so that the incorporation of the human and the way that actually humans interact with LLMs and vision language models these days. The line is blurred right now between model developers and model users because anyone can use LLM. It's super easy.

So that was the key moment in the last two years. But I also wanted to say in 2012, I remember, I was at the NIPS Conference. That was the time when ImageNet was released. And I remember people talking, people who were skeptics in neural networks, they were talking in the hallways almost religiously that, "Oh, this is the key moment. The radiologists are going to disappear. Then we are going to have computer vision everywhere, like ubiquitous." But see where we are, I don't think radiologists disappear.

Daniel Serfaty: Okay, what is ImageNet, by the way?

Svitlana Volkova: Okay, it's a neural network that allowed us to really understand images. And it was prior to the fusion models, it was the first time even before text embeddings and transformer models that Bob mentioned. That was the ImageNet moment for natural language processing. But for computer vision, I remember every paper we see there is a new architecture. ImageNet, it's a data set developed from Stanford, a large data collection where we could use it to learn this neural network models for computer vision, it was AlexNet and other models that appeared like right now we see LLMs appear. In 2012, we saw this different computer vision architectures just being invented that are more and more efficient.

Daniel Serfaty: That's great. Those are seminal events. I'm so glad you are here because you can put us in terms of the scale of when an idea, there are ideas every hour on AI. But when an idea really have this transformative, no pun intended here, but this transformative effect on the field. How do you think those advancements, whether it's notion of the example you just gave on images, or on transformers, or on the ability to do that extraordinary experiment you just described Svitlana, that's a reason ChatGPT was free because we were part of the design team, the users. They use that interaction in order to improve the tool, which was genius, I think. How are all these advancements being applied in some real-world scenarios? Give us some examples of what you consider today as successful implementation of those ideas. Things that have really changed not just the life of computer scientists but the life of users.

Robert (Bob) McCormack: One of the most salient examples to me is the amount of time I use Google versus the amount of time I'm using something like ChatGPT. Five years ago, if I had a question, I would go to Google, type it in, search through some links and really spend a lot of time curating knowledge to try to find the answer to my question. And try to de-conflict what different people are saying about this subject. Today, a lot of times, I'll go directly to ChatGPT if it's a simple knowledge question and just ask ChatGPT, what led to the fall of the Third Reich in Germany or trying to get factual information summarization. Now the caveat there is you have to be very wary of the information that ChatGPT or other LLMs give you. You need to not just inherently trust the answers, but take it as a starting point to continue research and doing better understanding. But in terms of really searching and organizing knowledge, I think that's where I've seen a huge change in impact on everyday life.

Svitlana Volkova: That was absolutely right. I never click on the links on Google anymore. If I Google, I look into the Gemini model output that is on top of the page and that gives you a succinct answer. And this is for Google [inaudible 00:11:19] because it can plan trips for you, it can give recommendations. I recently went to DC with my daughter and it was the first time for her, and I literally asked Claude to make the trip to the monuments more entertaining and generate facts for the seven years old. So what Claude can do, it can take boring facts and then translate it into the language that will be appealing to the seven years old. It was fascinating. It was really great.

Daniel Serfaty: Kids stories are the best stories because those kids are going to use AI much more than the two of you, Dr. McCormack and Dr. Volkova, I think they're going to be much more adept at that.

Robert (Bob) McCormack: So, I have a nine and a six-year-old, and one of the things we like to do is use AI to write songs, write funny songs, or write funny jokes together. They can personalize the songs about themselves and the events that happen to them that day. So there's a way to entertain your kids. I find it really fun and creative process.

Daniel Serfaty: By the way, we are using all these terms. Our audience is probably trying to sort out Claude, Gemini, ChatGPT. These are just different versions by different companies that are doing basically what ChatGPT, which is the one that most people know, is doing. They're doing it with different formats or different expertise, but they're all based on similar technologies.