Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add provider healthcheck controller #250

Merged
merged 1 commit into from
Nov 14, 2023

Conversation

Fedosin
Copy link
Contributor

@Fedosin Fedosin commented Sep 12, 2023

What this PR does / why we need it:
Currently we don't check the provider health after its manifests are applied, and Provider controller sets Installed and Ready status conditions to True at the same time. It may be confusing for users, especially when provider startup fails - User sees the provider CR with condition Ready=True, but the deployment is not available.

To fix this we introduce Provider Health Check Controller, which will check the deployment and update Ready status condition accordingly.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 12, 2023
@Fedosin
Copy link
Contributor Author

Fedosin commented Sep 12, 2023

/hold
for community discussion

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 12, 2023
@Fedosin Fedosin force-pushed the healthcheck branch 2 times, most recently from 4752646 to bb6a696 Compare September 27, 2023 20:11
Copy link
Contributor Author

@Fedosin Fedosin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 27, 2023
Copy link
Contributor

@alexander-demicev alexander-demicev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, just one minor comment

result = ctrl.Result{RequeueAfter: 5 * time.Second}
}

options := patch.WithOwnedConditions{Conditions: []clusterv1.ConditionType{clusterv1.ReadyCondition}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will patching the provider will cause unnecessary reconcile of the provider in the generic provider controller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@furkatgofurov7 furkatgofurov7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fedosin thanks for PR!

Overall LGTM, few clarification questions on reconciliation handling logic

internal/controller/healthcheck/healthcheck_controller.go Outdated Show resolved Hide resolved
return result, nil
}
// Error reading the object - requeue the request.
return result, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to flip this with the above return statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote it as

	if err := r.Client.Get(ctx, req.NamespacedName, deployment); err != nil {
		// Error reading the object - requeue the request.
		return result, err
	}

Now we always return the error.


// Compare provider's Ready condition with the deployment's Available condition and stop if they already match.
currentReadyCondition := conditions.Get(typedProvider, clusterv1.ReadyCondition)
if currentReadyCondition != nil && deploymentCondition != nil && currentReadyCondition.Status == deploymentCondition.Status {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: are not we comparing True with Available here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we compare condition statuses, they are both "True" or "False". "Available" is a type:

  - lastTransitionTime: "2023-10-27T15:23:43Z"
    lastUpdateTime: "2023-10-27T15:23:43Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to prevent confusion I renamed deploymentCondition -> deploymentAvailableCondition

Comment on lines 108 to 119
if deploymentCondition != nil {
conditions.Set(typedProvider, &clusterv1.Condition{
Type: clusterv1.ReadyCondition,
Status: deploymentCondition.Status,
Reason: deploymentCondition.Reason,
})
} else {
conditions.Set(typedProvider, &clusterv1.Condition{
Type: clusterv1.ReadyCondition,
Status: corev1.ConditionFalse,
})
}

// Don't requeue immediately if the deployment is not ready, but rather wait 5 seconds.
if conditions.IsFalse(typedProvider, clusterv1.ReadyCondition) {
result = ctrl.Result{RequeueAfter: 5 * time.Second}
}
Copy link
Member

@furkatgofurov7 furkatgofurov7 Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: The sequence of var definition and its usage seems to be out of the way here. I think this whole block has to be moved up to line 85 or at least before initializing a patch helper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm wrong, but we can't do this before initializing the patch helper. When we create it, it stores the original value of the object. Then we do some modifications with the object. Finally, when we call Patch, the helper calculates a diff between the original value and the modified one.
If we move these modifications before initializing the helper, the diff will always be empty.

Copy link
Member

@furkatgofurov7 furkatgofurov7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 14, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 8c12eff3b13c64942b5245d8b3fa206548e04fa8

Copy link
Contributor

@alexander-demicev alexander-demicev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexander-demicev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2023
@k8s-ci-robot k8s-ci-robot merged commit 4ca48cc into kubernetes-sigs:main Nov 14, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants