models

Whisper

OpenAI's speech recognition model trained on 680,000 hours of audio.

Definition

Whisper is OpenAI's automatic speech recognition (ASR) model trained on massive multilingual data.

Capabilities: - Speech-to-text transcription - Translation to English - Language detection - Timestamp generation

Model Sizes: - Tiny: 39M parameters - Base: 74M - Small: 244M - Medium: 769M - Large: 1.55B

Training Data: - 680,000 hours of audio - Multilingual (99 languages) - Diverse: YouTube, podcasts, audiobooks

Features: - Robust to noise, accents - Handles multiple languages - Open source and free - Local deployment possible

Using Whisper to transcribe podcasts with speaker timestamps.

AI systems that can process and understand multiple types of data like text, images, and audio.

AI research company that created GPT, ChatGPT, DALL-E, and other leading AI systems.

Get bite-sized AI concepts delivered to your inbox.

A fast daily read on the biggest AI stories, tools, launches, demos, and deals.

Or follow along