Limiting GPU Resource Usage in Onnxruntime with DirectML Provider #1069

asynclee · 2024-11-18T09:53:41Z

asynclee
Nov 18, 2024

Env

Windows
DirectML Provider

Question

Is there a way to limit the amount of GPU resources used by Onnxruntime when running sLM models in Onnxruntime-genai?
For example, I’d like to know how to restrict the sLM model to use only up to 20% of GPU utilization.
I’m looking for such a setting to minimize the impact on other programs when running alongside them.

Answered by kunal-vaishnavi

Nov 19, 2024

You can use ONNX Runtime's execution provider options with ONNX Runtime GenAI. You can add them in the genai_config.json.

The CPU EP's options are available via ONNX Runtime's SessionOptions.

"session_options": {
    "log_id": "onnxruntime-genai",
    "provider_options": [],
    "cpu_ep_option_1_key": cpu_ep_option_1_value (e.g. false, 0, etc.),
    ...
}

For other EPs such as CUDA or DirectML, their options are available via ONNX Runtime's ProviderOptions.

"session_options": {
    "log_id": "onnxruntime-genai",
    "provider_options": [
        "ep_name": {
            "ep_name_option_1_key": "ep_name_option_1_value",
            ...
        }
    ],
}

You can also set the provider opt…

View full answer

elephantpanda · 2024-11-19T00:40:21Z

elephantpanda
Nov 19, 2024

I'm doubting that this package will give you this much control. You could try onnxruntime API, which is lower level.
Although genai does seem to have some hidden tricks which makes it faster which I have yet to work out.

0 replies

kunal-vaishnavi · 2024-11-19T01:20:15Z

kunal-vaishnavi
Nov 19, 2024
Collaborator

You can use ONNX Runtime's execution provider options with ONNX Runtime GenAI. You can add them in the genai_config.json.

The CPU EP's options are available via ONNX Runtime's SessionOptions.

"session_options": {
    "log_id": "onnxruntime-genai",
    "provider_options": [],
    "cpu_ep_option_1_key": cpu_ep_option_1_value (e.g. false, 0, etc.),
    ...
}

For other EPs such as CUDA or DirectML, their options are available via ONNX Runtime's ProviderOptions.

"session_options": {
    "log_id": "onnxruntime-genai",
    "provider_options": [
        "ep_name": {
            "ep_name_option_1_key": "ep_name_option_1_value",
            ...
        }
    ],
}

You can also set the provider options at runtime.

0 replies

elephantpanda · 2024-11-19T04:20:26Z

elephantpanda
Nov 19, 2024

What I mean when I say onnxruntime API gives you more control is you can mess around with the inputs and weights and decide when they go on the GPU and RAM etc.
A year ago I managed to write some scripts which you could load ONNX and run it on the GPU without barely using any RAM at all.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limiting GPU Resource Usage in Onnxruntime with DirectML Provider #1069

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Limiting GPU Resource Usage in Onnxruntime with DirectML Provider #1069

asynclee Nov 18, 2024

Env

Question

Replies: 3 comments

elephantpanda Nov 19, 2024

kunal-vaishnavi Nov 19, 2024 Collaborator

elephantpanda Nov 19, 2024

asynclee
Nov 18, 2024

elephantpanda
Nov 19, 2024

kunal-vaishnavi
Nov 19, 2024
Collaborator

elephantpanda
Nov 19, 2024