Skip to content

Commit

Permalink
add kep first version
Browse files Browse the repository at this point in the history
  • Loading branch information
KunWuLuan committed Sep 9, 2024
1 parent e6fb7b6 commit 65feed7
Showing 1 changed file with 223 additions and 1 deletion.
224 changes: 223 additions & 1 deletion keps/74-support-argo-workflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@

### Goals


- Support Argo Workflow in Kueue. Users only need to add `kueue.x-k8s.io/queue-name` to the
labels of the workflows and submit the workflow in suspend state.

### Non-Goals

Expand All @@ -39,10 +40,231 @@

### User Stories

#### Story 1

As a ML engineer, I want to do some data processing before my training job start. I will
submit a workflow with two steps, the first one is a data processing job, and the second
one is a PytorchJob. GPU is not required for the data processing job. So I hope the data
processing job will not be blocked by the GPU quota.

#### Story 2

As an ML engineer, my workflow comprises multiple stages that require GPU resources, all
of which have identical resource demands. I aspire to reuse the resources already allocated
by previous nodes in my workflow to enhance efficiency and resource utilization.

## Design Details

### Workflow as An Unit

Pods in one workflow can have differnet resources, nodeaffinity, tolerations, etc. And
parallelizm can change during the workflow's execution. So it is difficult to determine
how many resources on each flavor for a workflow by the controller. In this case, users have
to specify the resources for the workflow in workflow's annotation. Users can specify the
potential resource requirements for their workflows by setting `kueue.k8s.io/max-resources`
in the annotation, and they can configure tolerations for tainted nodes as well as node
selections using `kueue.k8s.io/toleration` and `kueue.k8s.io/node-selector`, respectively.

#### Drawback and Limitations

- It is not able to set different nodeSelectors and tolerations for more than one kind of podSets
in this way.

#### Advantages

- Architecture is simple, and it is easy to implement.

### Layer as An Unit

A workflow's template definition can be a container invocation (leaf template) or a list
of steps. We will create workload for each parallel step which is composed by leaf templates.
For the workflow which is composed by a single leaf template, we create a workload for it.

#### Examples

In the following example, we solely discuss which patterns of workflows should warrant the
creation of workloads, without delving into the specifics of how these workloads are created,
nor addressing the division of responsibilities between the workflow-controller and kueue.

##### Example 1 (ParallelSteps Contains Leaf Template Only)
For a parallelStep with only leaf templates, we create a workload for the parallelStep.
In the following example, we create workloads for `loop-example-depth-2(0:depth-1-1)` and `loop-example-depth-2(1:depth-1-2)`. Patterns of DAGs are similar, so we do not discuss them
separately.

```
# kubectl create -f - << EOF
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: loops-
namespace: argo
spec:
entrypoint: loop-example-depth-1
templates:
- name: loop-example-depth-2
steps:
- - name: print-message-loop
template: print-message
arguments:
parameters:
- name: message
value: "{{item}}"
withItems: # invoke print-message once for each item in parallel
- hello world # item 1
- goodbye world # item 2
- name: loop-example-depth-1
steps:
- - name: loop-example-depth-2
template: loop-example-depth-2
withItems:
- depth-1-1
- depth-1-2
- name: print-message
inputs:
parameters:
- name: message
container:
image: busybox
command: [echo]
args: ["{{inputs.parameters.message}}"]
EOF
# argo get loops-mlr6m
...
STEP TEMPLATE PODNAME DURATION MESSAGE
✔ loops-mlr6m loop-example-depth-1
└─┬─✔ loop-example-depth-2(0:depth-1-1) loop-example-depth-2
│ └─┬─✔ print-message-loop(0:hello world) print-message loops-mlr6m-print-message-2545579066 6s
│ └─✔ print-message-loop(1:goodbye world) print-message loops-mlr6m-print-message-323962978 5s
└─✔ loop-example-depth-2(1:depth-1-2) loop-example-depth-2
└─┬─✔ print-message-loop(0:hello world) print-message loops-mlr6m-print-message-520674448 4s
└─✔ print-message-loop(1:goodbye world) print-message loops-mlr6m-print-message-2893948292 6s
```

##### Example 2 (ParallelSteps Contains Leaf Template and Step)

For the step composed by a leaf template and another step, we create workload for the
leaf template. And the workload for the other step is created separately.
In the following example, we will create workload for `loops-644ch` and `loop-example-depth-2-2`.

```
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: loops-
namespace: argo
spec:
entrypoint: loop-example-depth-1
templates:
- name: loop-example-depth-2
steps:
- - name: print-message-loop
template: print-message
arguments:
parameters:
- name: message
value: "{{item}}"
withItems: # invoke print-message once for each item in parallel
- depth-2-1 # item 1
- depth-2-2 # item 2
- name: loop-example-depth-1
steps:
- - name: print-message
template: print-message
arguments:
parameters:
- name: message
value: "{{item}}"
withItems:
- depth-1-1
- depth-1-2
- name: loop-example-depth-2-2
template: loop-example-depth-2
- name: print-message
inputs:
parameters:
- name: message
container:
image: busybox
command: [echo]
args: ["{{inputs.parameters.message}}"]
# argo get loops-644ch
...
STEP TEMPLATE PODNAME DURATION MESSAGE
✔ loops-644ch loop-example-depth-1
└─┬─✔ loop-example-depth-2-2 loop-example-depth-2
│ └─┬─✔ print-message-loop(0:depth-2-1) print-message loops-644ch-print-message-1796012204 4s
│ └─✔ print-message-loop(1:depth-2-2) print-message loops-644ch-print-message-1116167650 6s
├─✔ print-message(0:depth-1-1) print-message loops-644ch-print-message-413467513 5s
└─✔ print-message(1:depth-1-2) print-message loops-644ch-print-message-3356863351 5s
```

##### Example 3 (Workflow with Single Container Template)

We create a workload for the single container template. For example:
```
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-
spec:
entrypoint: main
templates:
- name: main
plugin:
hello: { }
# argo get hello-jtlcw
...
STEP TEMPLATE PODNAME DURATION MESSAGE
◷ hello-jtlcw main
```

#### How to suspend a workflow step by step

We introduce two ways to manage the workflow. Responsebilities are different for the
workflow-controller and kueue-controller in two ways.

1. Give users a CLI to modify workflows and add a specific suspend template for each step.
When the workflows are suspended on this special suspend template, the job-controller in Kueue
create workloads for the next step. Modification of workflow-controller is not needed for
this way, so that it is easy to iterate, and no need to manage the version of argo and kueue.
By in this way, users can modify their workflows to skip waiting in kueue, which maybe is not
acceptable for some users.

2. Add a new field in the workflows' specs like suspendBySteps. If workflow.spec.suspendBySteps is
true, workflow-controller insert a special suspend template for each stepGroup. Job-controller in
Kueue watch and create workloads for the next step. After the workloads are admitted, the suspend
step is set finished.

3. Add a new webhook in Kueue. When new pods are added to the cluster, the webhook find out if
the pods is managed by the workflow and if the there is `kueue.x-k8s.io/queue-name` on the
workflow. If so, schedulingGates will be added to the pods, then these pods will be grouped by
job-controller in Kueue (pods can be found in the status of the workflow), and the workloads will
be created for each group. After the workloads are admitted, schedulingGates in pods are removed
so that the pods can be scheduled.

#### Drawback and Limitations



#### Advantages



### Plain Pod as An Unit



#### Drawback and Limitations

- Pods in same stepGroup are queued by different workload.
- Gang for stepGroup is not available.

#### Advantages



## Additional Details
Expand Down

0 comments on commit 65feed7

Please sign in to comment.