This repository serves as a comprehensive collection of inference scripts for Automatic Speech Recognition (ASR) models, optimized for both GPU and CPU environments. It aims to provide developers with a centralized resource to explore and implement various ASR inference solutions tailored to their specific use cases.
This project offers a range of inference scripts for popular ASR models, each optimized for different hardware configurations and performance requirements. Whether you're looking for high-speed CPU inference or GPU-accelerated processing, you'll find implementations suited to your needs.
- Optimized inference scripts for both GPU and CPU
- Support for multiple ASR models and architectures
- Easy-to-use interfaces for quick integration
- Comprehensive documentation and usage examples
- Performance benchmarks to help you choose the right solution
- OpenAI Whisper (Faster for GPU)
- Distil-Whisper (Faster for GPU)
- whisper.cpp models (Faster for CPU)
Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting.
In our scripts, we primarily use openai/whisper-large-v3 and distil-whisper large. You can modify the scripts to use small and medium models for faster inference speed.
- fp16 + SDPA
- fp16 + SDPA + Speculative Decoding
- Distil-whisper + fp16 + SDPA + Chunking
- fp16 + SDPA + Chunking + Speculative Decoding
- Whisper Medusa
Each method is designed to balance speed and accuracy for different use cases.
If you find this project helpful or interesting, please consider giving it a star ⭐️ on GitHub. Your support helps make this resource more visible to other developers who might benefit from it.
We're always looking to improve and expand our collection of ASR inference optimization techniques. If you have experience with other optimized methods for ASR inference or have developed your own optimizations, we'd love to hear from you! Feel free to open an issue to discuss new ideas or submit a pull request with your contributions. Whether it's a new optimization technique, an improvement to existing scripts, or documentation enhancements, your input is valuable to the community.
Together, we can make this repository an even more comprehensive resource for ASR developers. Thank you for your interest and support!