Definition
Normalization techniques standardize data or layer outputs to improve training stability and speed.
- **Types:**
- Batch Normalization: Normalize across batch
- Layer Normalization: Normalize across features (used in transformers)
- Instance Normalization: Per-sample normalization
- Group Normalization: Normalize groups of channels
Benefits: - Faster training convergence - Allows higher learning rates - Reduces internal covariate shift - Acts as regularization
In Transformers: - Layer normalization is standard - Pre-norm vs post-norm architectures - RMSNorm (simplified) gaining popularity
Data Normalization: - Scale inputs to similar ranges - Zero mean, unit variance common
Examples
Layer normalization after each transformer attention block.
Related Terms
Want more AI knowledge?
Get bite-sized AI concepts delivered to your inbox.
Free intelligence briefs. No spam, unsubscribe anytime.