KubernetesJobOperator fails if you launch more than one pod #44994
Labels
area:providers
kind:bug
This is a clearly a bug
needs-triage
label for new issues that we didn't triage yet
provider:cncf-kubernetes
Kubernetes provider related issues
Apache Airflow Provider(s)
cncf-kubernetes
Versions of Apache Airflow Providers
8.4.1
Apache Airflow version
2.10.2
Operating System
Not sure - in GCP cloud composer
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
What happened
If you launch a kubernetes job operator, it tries to find the pod after execution. If the job has launched multiple pods, it then fails when trying to log since it can't find more than one pod.
I can pinpoint the place in the source code if you give me a link, I just can't find where it is on github but have identified it locally.
It's when
raise AirflowException(f"More than one pod running with labels {label_selector}")
gets called.What you think should happen instead
Should be able to have some flag in the job operator constructor that prevents this behaviour from happening - many k8s jobs will launch more than one pods; or the
find_pod
logic is smart enough to know you will have more than one pod when your job has a large parallellism count.How to reproduce
Launch a
KubernetesJobOperator
with parallelism > 1 and I hit it every time. This seems so basic though that I wonder if I am doing something wrong since I would have expected other people to run into it if that was the case.I am running indexed jobs with completions equal to parallelism count.
Full config looks like this:
Anything else
I have fixed this issue by setting
reattach_on_restart
toFalse
, which prevents the labels issue but has side effect of producing a stray pod. I then delete that with a python operator that makes use of the k8s hook:Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: