Limiting GPU Resource Usage in Onnxruntime with DirectML Provider #1069
-
Env
QuestionIs there a way to limit the amount of GPU resources used by Onnxruntime when running sLM models in Onnxruntime-genai? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
I'm doubting that this package will give you this much control. You could try onnxruntime API, which is lower level. |
Beta Was this translation helpful? Give feedback.
-
You can use ONNX Runtime's execution provider options with ONNX Runtime GenAI. You can add them in the The CPU EP's options are available via ONNX Runtime's
For other EPs such as CUDA or DirectML, their options are available via ONNX Runtime's
You can also set the provider options at runtime. |
Beta Was this translation helpful? Give feedback.
-
What I mean when I say onnxruntime API gives you more control is you can mess around with the inputs and weights and decide when they go on the GPU and RAM etc. |
Beta Was this translation helpful? Give feedback.
You can use ONNX Runtime's execution provider options with ONNX Runtime GenAI. You can add them in the
genai_config.json
.The CPU EP's options are available via ONNX Runtime's
SessionOptions
.For other EPs such as CUDA or DirectML, their options are available via ONNX Runtime's
ProviderOptions
.You can also set the provider opt…