-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get inference for batch size > 1 #55
Comments
As you pointed out, we currently don't support batching. It's probably quite straightforward to add batching to our CTranslate2 Moonshine implementation. Adding batching support to the models in this repo is currently on my roadmap, and something I'll be working on in the next few weeks. |
@njeffrie So in the ctranslate implementation I can pass multiple audios in and it will transcribe them together? |
I addressed a dimension ordering issue in the PR and I checked that batching now works correctly with CTranslate2. You have to pass in multiple batched audios (padded to match length) as well as one [int(1)] SOT prompt for each batch. |
This is critical for server usecases, to improve concurrency on GPU. Faster Whisper added SYSTRAN/faster-whisper#856. For live use case this means multiple concurrent transcriptions can happen on the same GPU/model. This will significantly lower costs of deployment.
Is there a way to do this now? Or is this something on the roadmap?
Thanks!
The text was updated successfully, but these errors were encountered: