Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serving-aware partial preemption of workloads #3762

Open
3 tasks
mimowo opened this issue Dec 6, 2024 · 1 comment
Open
3 tasks

Serving-aware partial preemption of workloads #3762

mimowo opened this issue Dec 6, 2024 · 1 comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@mimowo
Copy link
Contributor

mimowo commented Dec 6, 2024

What would you like to be added:

Serving workloads are different than training - they can be easily trimmed - a Deployment can run at 70% or 50% of Pods. This is different to most AI training workloads, where all Pods need to run. We want to leverage this fact and optimize preemptions.

In particular, when a new high priority workload comes in and we have multiple serving workloads, we want to distribute the preemptions across the serving workloads, rather than preempting one completely.

Note that this is also related to the partial preemption for batch workloads: #975. We may consider having a solution which solves both problems, but for now it seems reasonable to have this dedicated issue, emphasizing that serving workloads are special in this regard.

Why is this needed:

To improve experience of hosting mix of training and inference workloads. When the high-priority workload comes, we can make room for it by trimming multiple serving workloads, rather than preempting completely one.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@mimowo mimowo added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 6, 2024
@mimowo mimowo changed the title Serving-aware preemption of workloads Serving-aware partial preemption of workloads Dec 6, 2024
@mimowo
Copy link
Contributor Author

mimowo commented Dec 6, 2024

cc @mwielgus @mwysokin @tenzen-y

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

1 participant