Operationalizing AI took place in Boca Raton last week, where experts from all over the world converged to talk all things AI, especially relating to DevOps. One of the most important topics was LLM, which stands for Large Language Model. A key component to LLM is natural language processing, whereby software can “understand” human language and many of its nuances, and “reply” in that same human language, again with apparent nuances.
Software developers have been attempting this for decades. As early as the late 1960s, tools such as ELIZA were created to do just that. But the user had to use very specific language that fit into a tiny model, with the program only needing about an eighth of a megabyte. The language was anything but natural.
Today with tools like ChatGPT, we humans are finally able to interact with computers using our own language, and the computers will, for the most part, understand what we’re saying.
But this aspect of natural language processing goes much deeper than just input and output. It’s not just a fancy interface slapped on top of a traditional program. Instead, such AI tools use natural language for processing and manipulating data. Let’s go a bit deeper to see how this is done.
A Generative Pre-trained Transformer (GPT, which is what ChatGPT uses) is fed massive amounts of text for processing, as training. This data might start with small sentence pairs in the form of input and output, which the system then uses to make linguistic associations, and from there, predictions. For example, an input/output pair might be something like this:
Input: What is the capital of France?
Output: The capital of France is Paris.
In a traditional software application, the programmer might make a list of pairs matching a country to its capital. But that’s not how it’s done in a GPT. Instead, a GPT makes textual associations such as the above input and output, and continues learning over time as it learns more about such associations. Another input might refer to Paris being a major cultural center for fine art. And another might refer to Paris as being one of many large cities in France. Over time, as the system is trained, the GPT will “learn” more and more about Paris by making similar associations, and ultimately generate new sentences and paragraphs with information that is correct.
For example, I asked ChatGPT to tell me 15 unusual facts about Paris. It listed several, including how the Eiffel Tower grows taller in the summer due to metal expanding under heat; that there’s a smaller version of the Statue of Liberty there; and this interesting one: “The design of Paris Metro tickets is based on the idea that no matter how many times you fold the ticket, the two numbers on the top will always add up to 9.”
Remember, ChatGPT doesn’t “know” these items. Instead, they are stored in its system as linguistic associations with the word Paris. These are associations it “learned” by “reading” massive amounts of pages on the internet. It started with small associations, and over time as it grew and grew, it accumulated so many associations that when asked a question, it could provide a sophisticated answer–all based on text–that gives us the impression that it “knows” these things and that it can “answer” in English.
As for where those sources came from – OpenAI, the creator of ChatGPT, will not say. However, Wikipedia is a likely candidate among the massive amount of information fed into it.
Indeed I asked ChatGPT, “Have you been trained with all of Wikipedia?” (I purposely phrased it using the word “trained” as opposed to asking, for example, “Have you read all of Wikipedia.”) Alas, its response was that it doesn’t have access to its own training material but that it assured me it was trained on a mixture of licensed data and publicly available data.
As a GPT system is trained, it uses what’s called an attention mechanism which allows the system to put different weights on different words within contextual meaning, whereby over time it learns that certain words in a sentence are essentially more important than others, and it makes associations. For example, when you’re reading this article, you’re more likely to notice words like “model” and “AI” than words such as “this”. How does a GPT know that certain words get more “weight” than other words? Again – back to patterns that it sees over and over in its training data.
To summarize, a GPT model doesn’t just use natural language as a fancy user interface. It processes natural language and “learns” from the language by making text associations and, from there, predictions.
And finally, before leaving, let’s ask ChatGPT if it has a list of country and city pairs:
Q: Does ChatGPT have in its model a simple map data structure mapping country names to capital city names?
ChatGPT: As of my last knowledge update in September 2021, ChatGPT does not have a built-in data structure that explicitly maps country names to capital city names. It relies on its pre-trained language model to generate responses based on patterns it has learned from a wide range of text sources.
Indeed, it “knows” what it knows through language patterns, not traditional software databases and models. And that’s why natural language processing is such a natural for AI.
Click full-screen to enable volume control