How to get inference for batch size > 1 #55

chrisreese-if · 2024-11-07T13:57:53Z

This is critical for server usecases, to improve concurrency on GPU. Faster Whisper added SYSTRAN/faster-whisper#856. For live use case this means multiple concurrent transcriptions can happen on the same GPU/model. This will significantly lower costs of deployment.

Is there a way to do this now? Or is this something on the roadmap?

Thanks!

njeffrie · 2024-11-25T02:03:33Z

As you pointed out, we currently don't support batching. It's probably quite straightforward to add batching to our CTranslate2 Moonshine implementation. Adding batching support to the models in this repo is currently on my roadmap, and something I'll be working on in the next few weeks.

kalradivyanshu · 2024-11-25T02:37:11Z

@njeffrie So in the ctranslate implementation I can pass multiple audios in and it will transcribe them together?

njeffrie · 2024-11-26T21:56:07Z

I addressed a dimension ordering issue in the PR and I checked that batching now works correctly with CTranslate2. You have to pass in multiple batched audios (padded to match length) as well as one [int(1)] SOT prompt for each batch.

njeffrie mentioned this issue Nov 26, 2024

Add support for Useful Sensors Moonshine model. OpenNMT/CTranslate2#1808

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get inference for batch size > 1 #55

How to get inference for batch size > 1 #55

chrisreese-if commented Nov 7, 2024

njeffrie commented Nov 25, 2024 •

edited

Loading

kalradivyanshu commented Nov 25, 2024

njeffrie commented Nov 26, 2024 •

edited

Loading

How to get inference for batch size > 1 #55

How to get inference for batch size > 1 #55

Comments

chrisreese-if commented Nov 7, 2024

njeffrie commented Nov 25, 2024 • edited Loading

kalradivyanshu commented Nov 25, 2024

njeffrie commented Nov 26, 2024 • edited Loading

njeffrie commented Nov 25, 2024 •

edited

Loading

njeffrie commented Nov 26, 2024 •

edited

Loading