-
-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workers stuck in terminating state when 'job_completion_wait' is specified and 'handle_sig_wait_for_completion' is triggered #369
Comments
@JonasKs nice work on getting this feature implemented! |
Hey @mernmic! Thanks for the good bug report! I was unsure how I missed that, but I actually used your suggested implementation when I was looking at my logs in my k8s lab, but acted on I'm sorry I didn't catch this. PR is very welcome - we deployed this to QA yesterday, so I assume I'll see your findings when I get to work in a few hours.. |
…obs is set to False. close python-arq#369
Currently the workers will be stuck in a termination state when the handle_sig_wait_for_completion is activated, causing ineffective k8s pod lifecycles.
This appears to be happening since the '_sleep_until_tasks_complete' function is checking len() of self.tasks, but the deletion of done task ids is nested under the if self.allow_pick_jobs. reference: https://github.com/samuelcolvin/arq/blob/main/arq/worker.py#L388
Have tried a couple things:
For both attempted fixes, have been met with asyncio.exceptions.CancelledError errors.
A fix that seems to be working, iterate tasks during '_sleep_until_tasks_complete' while loop and search for not done.
Proposed fix:
Existing code:
Recommended fix:
Is this a change you would support if a PR is made, or is there a better way around this issue?
The text was updated successfully, but these errors were encountered: