GPT (Generative Pre-trained Transformer)
A family of large language models by OpenAI that generate text by predicting the next token
What is GPT?
GPT stands for Generative Pre-trained Transformer. It is a family of large language models developed by OpenAI that can generate human-like text, answer questions, write code, and perform a wide variety of language tasks.
Think of GPT as an incredibly well-read writing assistant. It has consumed a vast library of text during pre-training and learned the patterns of human language so thoroughly that it can continue almost any piece of writing in a coherent and contextually appropriate way.
How Does It Work?
GPT is built on the decoder-only Transformer architecture. Its training happens in two main stages:
- Pre-training -- The model reads massive amounts of text from the internet and learns to predict the next word in a sequence. This unsupervised phase gives it broad language understanding.
- Alignment -- Techniques like Reinforcement Learning from Human Feedback (RLHF) and instruction tuning refine the model so it follows instructions, stays helpful, and avoids harmful outputs.
At inference time, GPT generates text one token at a time, each time choosing the most probable (or creatively sampled) next token based on everything that came before.
Key Milestones
- GPT-2 (2019) -- demonstrated surprisingly coherent long-form text generation.
- GPT-3 (2020) -- showed that scaling up parameters dramatically improves capability.
- GPT-4 (2023) -- introduced multimodal abilities (text and images) and significantly improved reasoning.