-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add provider healthcheck controller #250
Conversation
d115324
to
cc75bdc
Compare
/hold |
4752646
to
bb6a696
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold cancel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, just one minor comment
result = ctrl.Result{RequeueAfter: 5 * time.Second} | ||
} | ||
|
||
options := patch.WithOwnedConditions{Conditions: []clusterv1.ConditionType{clusterv1.ReadyCondition}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will patching the provider will cause unnecessary reconcile of the provider in the generic provider controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but since the spec stays the same, it stops almost immediately https://github.com/kubernetes-sigs/cluster-api-operator/blob/main/internal/controller/genericprovider_controller.go#L121-L125
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fedosin thanks for PR!
Overall LGTM, few clarification questions on reconciliation handling logic
return result, nil | ||
} | ||
// Error reading the object - requeue the request. | ||
return result, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to flip this with the above return statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote it as
if err := r.Client.Get(ctx, req.NamespacedName, deployment); err != nil {
// Error reading the object - requeue the request.
return result, err
}
Now we always return the error.
|
||
// Compare provider's Ready condition with the deployment's Available condition and stop if they already match. | ||
currentReadyCondition := conditions.Get(typedProvider, clusterv1.ReadyCondition) | ||
if currentReadyCondition != nil && deploymentCondition != nil && currentReadyCondition.Status == deploymentCondition.Status { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: are not we comparing True
with Available
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we compare condition statuses, they are both "True" or "False". "Available" is a type:
- lastTransitionTime: "2023-10-27T15:23:43Z"
lastUpdateTime: "2023-10-27T15:23:43Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, to prevent confusion I renamed deploymentCondition -> deploymentAvailableCondition
if deploymentCondition != nil { | ||
conditions.Set(typedProvider, &clusterv1.Condition{ | ||
Type: clusterv1.ReadyCondition, | ||
Status: deploymentCondition.Status, | ||
Reason: deploymentCondition.Reason, | ||
}) | ||
} else { | ||
conditions.Set(typedProvider, &clusterv1.Condition{ | ||
Type: clusterv1.ReadyCondition, | ||
Status: corev1.ConditionFalse, | ||
}) | ||
} | ||
|
||
// Don't requeue immediately if the deployment is not ready, but rather wait 5 seconds. | ||
if conditions.IsFalse(typedProvider, clusterv1.ReadyCondition) { | ||
result = ctrl.Result{RequeueAfter: 5 * time.Second} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: The sequence of var definition and its usage seems to be out of the way here. I think this whole block has to be moved up to line 85 or at least before initializing a patch helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm wrong, but we can't do this before initializing the patch helper. When we create it, it stores the original value of the object. Then we do some modifications with the object. Finally, when we call Patch, the helper calculates a diff between the original value and the modified one.
If we move these modifications before initializing the helper, the diff will always be empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: 8c12eff3b13c64942b5245d8b3fa206548e04fa8
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alexander-demicev The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
Currently we don't check the provider health after its manifests are applied, and Provider controller sets
Installed
andReady
status conditions toTrue
at the same time. It may be confusing for users, especially when provider startup fails - User sees the provider CR with condition Ready=True, but the deployment is not available.To fix this we introduce Provider Health Check Controller, which will check the deployment and update Ready status condition accordingly.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #