Back to Glossary
models

Whisper

OpenAI's speech recognition model trained on 680,000 hours of audio.

Share:

Definition

Whisper is OpenAI's automatic speech recognition (ASR) model trained on massive multilingual data.

Capabilities: - Speech-to-text transcription - Translation to English - Language detection - Timestamp generation

Model Sizes: - Tiny: 39M parameters - Base: 74M - Small: 244M - Medium: 769M - Large: 1.55B

Training Data: - 680,000 hours of audio - Multilingual (99 languages) - Diverse: YouTube, podcasts, audiobooks

Features: - Robust to noise, accents - Handles multiple languages - Open source and free - Local deployment possible

Examples

Using Whisper to transcribe podcasts with speaker timestamps.

Want more AI knowledge?

Get bite-sized AI concepts delivered to your inbox.

Free intelligence briefs. No spam, unsubscribe anytime.

Discussion