Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workload active field doesn't work for StatefulSet Integration #3851

Open
xiongzubiao opened this issue Dec 14, 2024 · 9 comments
Open

Workload active field doesn't work for StatefulSet Integration #3851

xiongzubiao opened this issue Dec 14, 2024 · 9 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@xiongzubiao
Copy link

What happened:
Changing the active field of the Workload object from true to false doesn't scale down the StatefulSet permanently in v0.10.0-rc.4. It only temporarily change the status. Both the workload object and the pods are deleted but immediately recreated automatically.

What you expected to happen:
Changing the active field from true to false should scale the StatefulSet down to 0. Changing it back to true should scale up the StatefulSet.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.31.0
  • Kueue version (use git describe --tags --dirty --always): v0.10.0-rc.4
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@xiongzubiao xiongzubiao added the kind/bug Categorizes issue or PR as related to a bug. label Dec 14, 2024
@mimowo
Copy link
Contributor

mimowo commented Dec 16, 2024

it needs testing but I think adding the StatefulSet to the owners of the Workload will prevent workload deletion and so it should work well.

@mbobrovskyi
Copy link
Contributor

/assign

@mbobrovskyi
Copy link
Contributor

it needs testing but I think adding the StatefulSet to the owners of the Workload will prevent workload deletion and so it should work well.

It won't help because the workload could still finish. I think we need to add a "serving" field to the workload to prevent it from finishing.

@mimowo
Copy link
Contributor

mimowo commented Dec 17, 2024

It won't help because the workload could still finish.

Could you please that approach first. It is possible that I'm missing something, but I think we need to keep the workload around. Also, it is not clear to me why we need the "serving" field on workload since it is already on the pods created by StatefulSet. In any case, the argument is not convincing to me yet.

@mbobrovskyi
Copy link
Contributor

The problem is that the workload finishes once all the pods are finished. Yes, with the ownerReference, the workload won't disappear, but we can't admit it after reactivation.

@mimowo
Copy link
Contributor

mimowo commented Dec 17, 2024

Preventing the workload from disappearing is the first step - if it is gone, then for sure "active" is gone too.

Yes, possibly we may need the "serving" field on workload, but I would prefer to reuse the pod annotation so that we can cherry-pick the fix for 0.10.1. So, I would prefer to first understand well where are the issues, rather than jumping to conclusions.

@mimowo
Copy link
Contributor

mimowo commented Dec 17, 2024

Moreover, starting with ownerReferences would fix the issue already for workloads which return non-zero exit code on SIGTERM, which would already unblock the use case for some users.

@mbobrovskyi
Copy link
Contributor

/unassign

Sorry, I don't have capacity to work on it.

@varshaprasad96
Copy link
Member

@mbobrovskyi I can work on it after the holidays.

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants