Definition
Positional encoding adds position information to transformers, which otherwise have no notion of order.
Why Needed: - Attention treats all positions equally - "The cat sat on mat" vs "Mat on sat cat the" - Position information is crucial for understanding
- **Types:**
- Sinusoidal: Original transformer, fixed patterns
- Learned: Train position embeddings
- Relative: Encode relative distances (T5, Transformer-XL)
- RoPE: Rotary Position Embedding (Llama)
- ALiBi: Attention with Linear Biases
RoPE (Popular in Modern LLMs): - Encodes position in attention computation - Better extrapolation to longer sequences - Used in Llama, Mistral, many others
Context Length: - Position encoding limits context length - Some methods extrapolate better than others
Examples
RoPE allowing Llama to handle sequences longer than seen during training.
Related Terms
Want more AI knowledge?
Get bite-sized AI concepts delivered to your inbox.
Free intelligence briefs. No spam, unsubscribe anytime.