Back to Glossary
concepts

Token

The basic unit of text that language models process, roughly 3/4 of a word.

Share:

Definition

Tokens are the fundamental units that language models use to process text. Tokenization breaks text into these smaller pieces before processing.

Tokenization Examples: - "Hello world" → ["Hello", " world"] (2 tokens) - "unhappiness" → ["un", "happiness"] or ["unhapp", "iness"] - Spaces, punctuation, and special characters are often separate tokens

  • **Why Tokens Matter:**
  • Context Windows: Models have token limits (e.g., 128K tokens)
  • Pricing: API costs are per token
  • Performance: Longer inputs = slower processing

Rules of Thumb: - ~4 characters per token (English) - ~3/4 words per token - 1 page ≈ 500-600 tokens - Non-English languages often use more tokens

Examples

GPT-4 has a 128K token context window, roughly 300 pages of text.

Want more AI knowledge?

Get bite-sized AI concepts delivered to your inbox.

Free intelligence briefs. No spam, unsubscribe anytime.

Discussion