-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support KubeRay in MultiKueue via managedBy #3822
Comments
/assign @mszadkow |
This ticket requires to first provide RayJob and RayCluster Multikueue adapter and setup e2e tests, |
sgtm, we can start with tests using 1.2.2 but without managedBy field and merge it as a starter. Or alternatively already use latest (main) of KubeRay just for testing purposes - we will merge once KubeRay is released. |
Another problem that I have right now is that Kuberay clusters startup time is huge for e2e tests. |
Ok, in that case we may need to have a separate CI for Ray. However, let me first understand what you exactly mean by "cluster startup time" - is this the installation of the KubeRay, or time to run the first RayJob? Does it also take long to run follow-up Jobs? Also, please make sure you are rebased against the main branch, because recently we increased CPU limits for Kueue to 2000m which might be relevant here too. |
It may be possible to construct a dummy RayCluster that just calls |
thank you @andrewsykim for the suggestion. @mszadkow can we try that? maybe you already did and hit some complications? |
What would you like to be added:
Support for KubeRay via managedBy in MultiKueue.
The relevant support for the managedBy field has been recently merged in KubeRay, see ray-project/kuberay#2544, and will be released most likely in 1.3.
Until then we can use the main branch of kuberay in Kueue to test it all works. Once KubeRay is released we can switch to the released version and merge to Kueue.
Why is this needed:
Support of KubeRay via managedBy in MultiKueue, allowing for:
The text was updated successfully, but these errors were encountered: