Transformer
The Transformer is the neural network architecture underlying nearly all modern language models since 2017 — GPT, Claude, Gemini and Llama.
The Transformer was introduced in 'Attention is All You Need' (2017) by Google researchers. Key innovation: the attention mechanism, letting the model compute contextual relations between all words in an input in parallel. This enabled much larger, more efficient models than the then-prevalent RNNs. Every modern LLM is a variation on Transformer.
Example
For the sentence 'The bank sits by the river', the transformer understands via attention that 'bank' relates to 'river' (nature) not financial. RNNs did this sequentially; transformers in parallel — faster and more accurate.
Frequently asked questions
Are all LLMs transformers?
In 2026: virtually all. Alternatives (Mamba, RWKV) exist but have marginal adoption.
Related terms
Further reading
- → Our service: AI sector