Back to Glossary
concepts

Learning Rate

Hyperparameter controlling how much model weights change with each update.

Share:

Definition

Learning rate determines the step size when updating model parameters during training.

  • **Impact:**
  • Too High: Overshoots optimal values, unstable
  • Too Low: Very slow convergence
  • Just Right: Smooth, efficient training

Typical Values: - 1e-3 to 1e-5 common - Transformers often use 1e-4 to 1e-5 - Fine-tuning uses smaller rates

  • **Scheduling:**
  • Constant: Same throughout
  • Step Decay: Reduce at intervals
  • Cosine Annealing: Smooth decrease
  • Warmup: Start low, increase, then decrease

Finding Good Rate: - Learning rate finder - Grid search - Start with common defaults

Examples

Using learning rate 0.001 with cosine decay schedule for training.

Want more AI knowledge?

Get bite-sized AI concepts delivered to your inbox.

Free intelligence briefs. No spam, unsubscribe anytime.

Discussion