Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get inference for batch size > 1 #55

Open
chrisreese-if opened this issue Nov 7, 2024 · 3 comments
Open

How to get inference for batch size > 1 #55

chrisreese-if opened this issue Nov 7, 2024 · 3 comments

Comments

@chrisreese-if
Copy link

This is critical for server usecases, to improve concurrency on GPU. Faster Whisper added SYSTRAN/faster-whisper#856. For live use case this means multiple concurrent transcriptions can happen on the same GPU/model. This will significantly lower costs of deployment.

Is there a way to do this now? Or is this something on the roadmap?

Thanks!

@njeffrie
Copy link
Contributor

njeffrie commented Nov 25, 2024

As you pointed out, we currently don't support batching. It's probably quite straightforward to add batching to our CTranslate2 Moonshine implementation. Adding batching support to the models in this repo is currently on my roadmap, and something I'll be working on in the next few weeks.

@kalradivyanshu
Copy link

@njeffrie So in the ctranslate implementation I can pass multiple audios in and it will transcribe them together?

@njeffrie
Copy link
Contributor

njeffrie commented Nov 26, 2024

I addressed a dimension ordering issue in the PR and I checked that batching now works correctly with CTranslate2. You have to pass in multiple batched audios (padded to match length) as well as one [int(1)] SOT prompt for each batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants