Before 2017, AI processed text like a human reads a book: word-by-word (RNNs). The Transformer changed this by using Self-Attention.
Though both are based on Transformers, they are designed for different goals.
| Feature | BERT (Bidirectional Encoder) | GPT (Generative Pre-trained Transformer) |
| Training Goal | Understands context from both sides (left & right). | Predicts the next word in a sequence (left to right). |
| Primary Use | Classification, Sentiment, Named Entity Recognition. | Text Generation, Coding, Creative Writing. |
| Analogy | A student taking a multiple-choice “fill in the blank” test. | A storyteller writing a novel one word at a time. |
Large Language Models (LLMs) are “Pre-trained” on the whole internet. Fine-tuning is the process of taking that “base” knowledge and giving it specialized “PhD-level” training on a specific dataset.
This is the art of crafting inputs to get the best output from an LLM.
LLMs have a “Knowledge Cutoff” (they don’t know what happened yesterday) and they “Hallucinate” (make things up). RAG solves this.
LangChain is a framework that makes it easy to build RAG and Generative AI applications. It “chains” different components together.
If you were to combine these for a hospital: