Speech Transcription with parakeet-tdt-0.6b-v2

This demo showcases parakeet-tdt-0.6b-v2, a 600-million-parameter model designed for high-quality English speech recognition.

Key Features:

  • Automatic punctuation and capitalization
  • Accurate word-level timestamps (click on a segment in the table below to play it!)
  • Efficiently transcribes long audio segments (updated to support upto 3 hours) (For even longer audios, see this script)
  • Robust performance on spoken numbers, and song lyrics transcription

This model is available for commercial and non-commercial use.

🎙️ Learn more about the Model | 📄 Fast Conformer paper | 📚 TDT paper | 🧑‍💻 NeMo Repository

Example Audio Files (Click to Load)

Transcription Results (Click row to play segment)

Transcription Segments

Transcription Segments