22. What language models are good at


I get confused when people are severely disappointed ChatGPT doesn’t know something or hallucinates. I’m not sure why it’s important a large language model know that the Dodgers won the 1988 World Series, or what pi to the power of your birthday is, but I suspect part of the confusion stems from the whole “ChatGPT as Google search” killer discourse.

This strikes me as a misunderstanding of what language models are good and bad at. They are designed to predict the next token given a string of tokens. Language models look at a given text, assess the relationships of all tokens to one another, and arrive at a reasonable prediction of what text follows. (Hence the disparagement “stochastic parrot.”)

Language models are not trained or designed to do math or retrieve information from a URL or store memory, yet users often come with these expectations.

What makes language models still very important is that intelligence seems to resemble this pattern of predicting the next token. One might argue a human engaging in conversation, or creating an analogy, or writing a story, or programming an application are all just text completions with different expectations. Telling a joke is completing a text with a heavy emphasis on subverting the listener’s expectations with a surprise.

From what we can tell, weighing tokens and assessing a following token’s probability seems to be the basis of understanding. It forms the kernel of reason. [1]

In this respect, builders should recognize that an LLM alone is more like a content generator, synthesizer, text evaluator, and, if executed properly, a reasoning engine. It is not a fact base, or search engine, or calculator, or code interpreter — but can and should be and soon overwhelmingly will be equipped with those things. Users expect it.

[1] Sparks of Artificial General Intelligence: Early experiments with GPT-4. https://arxiv.org/pdf/2303.12712.pdf