Definition
Learning rate determines the step size when updating model parameters during training.
- **Impact:**
- Too High: Overshoots optimal values, unstable
- Too Low: Very slow convergence
- Just Right: Smooth, efficient training
Typical Values: - 1e-3 to 1e-5 common - Transformers often use 1e-4 to 1e-5 - Fine-tuning uses smaller rates
- **Scheduling:**
- Constant: Same throughout
- Step Decay: Reduce at intervals
- Cosine Annealing: Smooth decrease
- Warmup: Start low, increase, then decrease
Finding Good Rate: - Learning rate finder - Grid search - Start with common defaults
Examples
Using learning rate 0.001 with cosine decay schedule for training.
Related Terms
Want more AI knowledge?
Get bite-sized AI concepts delivered to your inbox.
Free intelligence briefs. No spam, unsubscribe anytime.