Back to Glossary
techniques

Positional Encoding

Technique to give transformers information about token positions in a sequence.

Share:

Definition

Positional encoding adds position information to transformers, which otherwise have no notion of order.

Why Needed: - Attention treats all positions equally - "The cat sat on mat" vs "Mat on sat cat the" - Position information is crucial for understanding

  • **Types:**
  • Sinusoidal: Original transformer, fixed patterns
  • Learned: Train position embeddings
  • Relative: Encode relative distances (T5, Transformer-XL)
  • RoPE: Rotary Position Embedding (Llama)
  • ALiBi: Attention with Linear Biases

RoPE (Popular in Modern LLMs): - Encodes position in attention computation - Better extrapolation to longer sequences - Used in Llama, Mistral, many others

Context Length: - Position encoding limits context length - Some methods extrapolate better than others

Examples

RoPE allowing Llama to handle sequences longer than seen during training.

Want more AI knowledge?

Get bite-sized AI concepts delivered to your inbox.

Free intelligence briefs. No spam, unsubscribe anytime.

Discussion