Token
A token is the base unit an LLM processes text in — typically a sub-word fragment, roughly 4 characters or 0.75 words in English.
LLMs don't work on words or characters but on tokens — sub-word units created by a tokenizer. The word 'volatility' might split into ['vola', 'til', 'ity']. Per language this varies: English ~1.33 tokens/word, Dutch ~1.5-2, Chinese ~1.3 tokens per character. LLM costs are typically priced per million input/output tokens.
Example
A 10,000-word English report is ~13,300 tokens. With Claude Sonnet ($3/1M input tokens) analysing it costs ~$0.04. Same report in Dutch: ~17,500 tokens ≈ $0.05.
Frequently asked questions
Is Dutch more expensive than English?
Yes, ~25-40% more tokens for the same content. For production: factor this into cost modelling.
Related terms
Further reading
- → Our service: AI sector