Transformer

By Paul Brock·Updated on 22-04-2026

TL;DR

The Transformer is the neural network architecture underlying nearly all modern language models since 2017 — GPT, Claude, Gemini and Llama.

The Transformer was introduced in 'Attention is All You Need' (2017) by Google researchers. Key innovation: the attention mechanism, letting the model compute contextual relations between all words in an input in parallel. This enabled much larger, more efficient models than the then-prevalent RNNs. Every modern LLM is a variation on Transformer.

Example

For the sentence 'The bank sits by the river', the transformer understands via attention that 'bank' relates to 'river' (nature) not financial. RNNs did this sequentially; transformers in parallel — faster and more accurate.

Frequently asked questions

Are all LLMs transformers?

In 2026: virtually all. Alternatives (Mamba, RWKV) exist but have marginal adoption.

Transformer

Example

Frequently asked questions

Related terms

Further reading

Need help with SEO or GEO?