Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: run e2e tests for libvirt with CRI-O #2068

Merged
merged 7 commits into from
Dec 5, 2024

Conversation

wainersm
Copy link
Member

This is the minimum required to run e2e tests for libvirt on k8s configured with CRI-O.

  • libvirt/kcli_cluster.sh can now provision k8s configured with CRI-O. Just need to export CONTAINER_RUNTIME=crio
  • The e2e framework will handle the container_runtime=crio property in libvirt.properties
  • Finally the changes on workflows to add a new job were made. Notice that the CRI-O will run with continue-on-error enabled for a while as we will be to fully check its stability by running on CI env.

Fixes #1981

@wainersm wainersm added CI Issues related to CI workflows provider/libvirt e2e-test labels Sep 27, 2024
@wainersm wainersm requested a review from a team as a code owner September 27, 2024 21:34
@wainersm wainersm added the hold label Sep 27, 2024
@wainersm
Copy link
Member Author

Putting on hold because some tests are failing and I didn't run the attestation ones yet. But I appreciate any feedback on these changes.

The preliminary tests execution:

=== RUN   TestLibvirtCreateSimplePod
=== RUN   TestLibvirtCreateSimplePod/SimplePeerPod_test
    assessment_runner.go:264: Waiting for containers in pod: simple-test are ready
=== RUN   TestLibvirtCreateSimplePod/SimplePeerPod_test/PodVM_is_created
=== NAME  TestLibvirtCreateSimplePod/SimplePeerPod_test
    assessment_runner.go:617: Deleting pod simple-test...
    assessment_runner.go:624: Pod simple-test has been successfully deleted within 60s
--- PASS: TestLibvirtCreateSimplePod (105.08s)
    --- PASS: TestLibvirtCreateSimplePod/SimplePeerPod_test (105.08s)
        --- PASS: TestLibvirtCreateSimplePod/SimplePeerPod_test/PodVM_is_created (0.02s)
=== RUN   TestLibvirtCreatePodWithConfigMap
=== RUN   TestLibvirtCreatePodWithConfigMap/ConfigMapPeerPod_test
    assessment_runner.go:264: Waiting for containers in pod: busybox-configmap-pod are ready
=== RUN   TestLibvirtCreatePodWithConfigMap/ConfigMapPeerPod_test/Configmap_is_created_and_contains_data
    assessment_runner.go:435: Output when execute test commands: 
=== NAME  TestLibvirtCreatePodWithConfigMap/ConfigMapPeerPod_test
    assessment_runner.go:560: Deleting Configmap... busybox-configmap
    assessment_runner.go:617: Deleting pod busybox-configmap-pod...
    assessment_runner.go:624: Pod busybox-configmap-pod has been successfully deleted within 60s
--- PASS: TestLibvirtCreatePodWithConfigMap (145.17s)
    --- PASS: TestLibvirtCreatePodWithConfigMap/ConfigMapPeerPod_test (145.17s)
        --- PASS: TestLibvirtCreatePodWithConfigMap/ConfigMapPeerPod_test/Configmap_is_created_and_contains_data (5.07s)
=== RUN   TestLibvirtCreatePodWithSecret
=== RUN   TestLibvirtCreatePodWithSecret/SecretPeerPod_test
    assessment_runner.go:264: Waiting for containers in pod: busybox-secret-pod are ready
=== RUN   TestLibvirtCreatePodWithSecret/SecretPeerPod_test/Secret_has_been_created_and_contains_data
    assessment_runner.go:435: Output when execute test commands: 
=== NAME  TestLibvirtCreatePodWithSecret/SecretPeerPod_test
    assessment_runner.go:567: Deleting Secret... busybox-secret
    assessment_runner.go:617: Deleting pod busybox-secret-pod...
    assessment_runner.go:624: Pod busybox-secret-pod has been successfully deleted within 60s
--- PASS: TestLibvirtCreatePodWithSecret (105.16s)
    --- PASS: TestLibvirtCreatePodWithSecret/SecretPeerPod_test (105.16s)
        --- PASS: TestLibvirtCreatePodWithSecret/SecretPeerPod_test/Secret_has_been_created_and_contains_data (5.08s)
=== RUN   TestLibvirtCreatePeerPodContainerWithExternalIPAccess
=== RUN   TestLibvirtCreatePeerPodContainerWithExternalIPAccess/IPAccessPeerPod_test
    assessment_runner.go:264: Waiting for containers in pod: busybox-priv are ready
=== RUN   TestLibvirtCreatePeerPodContainerWithExternalIPAccess/IPAccessPeerPod_test/Peer_Pod_Container_Connected_to_External_IP
    assessment_runner.go:435: Output when execute test commands: command terminated with exit code 1
    assessment_runner.go:437: command terminated with exit code 1
=== NAME  TestLibvirtCreatePeerPodContainerWithExternalIPAccess/IPAccessPeerPod_test
    assessment_runner.go:617: Deleting pod busybox-priv...
    assessment_runner.go:624: Pod busybox-priv has been successfully deleted within 60s
--- FAIL: TestLibvirtCreatePeerPodContainerWithExternalIPAccess (115.15s)
    --- FAIL: TestLibvirtCreatePeerPodContainerWithExternalIPAccess/IPAccessPeerPod_test (115.15s)
        --- FAIL: TestLibvirtCreatePeerPodContainerWithExternalIPAccess/IPAccessPeerPod_test/Peer_Pod_Container_Connected_to_External_IP (15.08s)
=== RUN   TestLibvirtCreatePeerPodWithJob
=== RUN   TestLibvirtCreatePeerPodWithJob/JobPeerPod_test
=== RUN   TestLibvirtCreatePeerPodWithJob/JobPeerPod_test/Job_has_been_created
    assessment_helpers.go:300: SUCCESS: job-pi-72kdq - Completed - LOG: 3.14156
    assessment_runner.go:336: Output Log from Pod: 3.14156
=== NAME  TestLibvirtCreatePeerPodWithJob/JobPeerPod_test
    assessment_runner.go:600: Deleting Job... job-pi
    assessment_runner.go:607: Deleting pods created by job... job-pi-72kdq
--- PASS: TestLibvirtCreatePeerPodWithJob (90.13s)
    --- PASS: TestLibvirtCreatePeerPodWithJob/JobPeerPod_test (90.13s)
        --- PASS: TestLibvirtCreatePeerPodWithJob/JobPeerPod_test/Job_has_been_created (0.02s)
=== RUN   TestLibvirtCreatePeerPodAndCheckUserLogs
    common_suite.go:165: Skipping Test until issue kata-containers/kata-containers#5732 is Fixed
--- SKIP: TestLibvirtCreatePeerPodAndCheckUserLogs (0.00s)
=== RUN   TestLibvirtCreatePeerPodAndCheckWorkDirLogs
=== RUN   TestLibvirtCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test
=== RUN   TestLibvirtCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test/Peer_pod_with_work_directory_has_been_created
    assessment_runner.go:366: Log output of peer pod:/other
=== NAME  TestLibvirtCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test
    assessment_runner.go:617: Deleting pod workdirpod...
    assessment_runner.go:624: Pod workdirpod has been successfully deleted within 60s
--- PASS: TestLibvirtCreatePeerPodAndCheckWorkDirLogs (105.09s)
    --- PASS: TestLibvirtCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test (105.09s)
        --- PASS: TestLibvirtCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test/Peer_pod_with_work_directory_has_been_created (5.01s)
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly/EnvVariablePeerPodWithImageOnly_test
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly/EnvVariablePeerPodWithImageOnly_test/Peer_pod_with_environmental_variables_has_been_created
    assessment_runner.go:366: Log output of peer pod:KUBERNETES_SERVICE_PORT=443
        KUBERNETES_PORT=tcp://10.96.0.1:443
        HOSTNAME=env-variable-in-image
        SHLVL=1
        HOME=/root
        TERM=xterm
        KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
        PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
        KUBERNETES_PORT_443_TCP_PORT=443
        KUBERNETES_PORT_443_TCP_PROTO=tcp
        KUBERNETES_SERVICE_PORT_HTTPS=443
        KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
        ISPRODUCTION=false
        KUBERNETES_SERVICE_HOST=10.96.0.1
        PWD=/
=== NAME  TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly/EnvVariablePeerPodWithImageOnly_test
    assessment_runner.go:617: Deleting pod env-variable-in-image...
    assessment_runner.go:624: Pod env-variable-in-image has been successfully deleted within 60s
--- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly (100.08s)
    --- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly/EnvVariablePeerPodWithImageOnly_test (100.08s)
        --- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly/EnvVariablePeerPodWithImageOnly_test/Peer_pod_with_environmental_variables_has_been_created (5.03s)
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly/EnvVariablePeerPodWithDeploymentOnly_test
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly/EnvVariablePeerPodWithDeploymentOnly_test/Peer_pod_with_environmental_variables_has_been_created
    assessment_runner.go:366: Log output of peer pod:KUBERNETES_SERVICE_PORT=443
        KUBERNETES_PORT=tcp://10.96.0.1:443
        HOSTNAME=env-variable-in-config
        SHLVL=1
        HOME=/root
        TERM=xterm
        KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
        PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
        KUBERNETES_PORT_443_TCP_PORT=443
        KUBERNETES_PORT_443_TCP_PROTO=tcp
        KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
        KUBERNETES_SERVICE_PORT_HTTPS=443
        ISPRODUCTION=true
        KUBERNETES_SERVICE_HOST=10.96.0.1
        PWD=/
=== NAME  TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly/EnvVariablePeerPodWithDeploymentOnly_test
    assessment_runner.go:617: Deleting pod env-variable-in-config...
    assessment_runner.go:624: Pod env-variable-in-config has been successfully deleted within 60s
--- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly (100.09s)
    --- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly/EnvVariablePeerPodWithDeploymentOnly_test (100.09s)
        --- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly/EnvVariablePeerPodWithDeploymentOnly_test/Peer_pod_with_environmental_variables_has_been_created (5.02s)
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment/EnvVariablePeerPodWithBoth_test
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment/EnvVariablePeerPodWithBoth_test/Peer_pod_with_environmental_variables_has_been_created
    assessment_runner.go:366: Log output of peer pod:KUBERNETES_PORT=tcp://10.96.0.1:443
        KUBERNETES_SERVICE_PORT=443
        HOSTNAME=env-variable-in-both
        SHLVL=1
        HOME=/root
        TERM=xterm
        KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
        PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
        KUBERNETES_PORT_443_TCP_PORT=443
        KUBERNETES_PORT_443_TCP_PROTO=tcp
        KUBERNETES_SERVICE_PORT_HTTPS=443
        KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
        ISPRODUCTION=true
        KUBERNETES_SERVICE_HOST=10.96.0.1
        PWD=/
=== NAME  TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment/EnvVariablePeerPodWithBoth_test
    assessment_runner.go:617: Deleting pod env-variable-in-both...
    assessment_runner.go:624: Pod env-variable-in-both has been successfully deleted within 60s
--- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment (100.09s)
    --- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment/EnvVariablePeerPodWithBoth_test (100.09s)
        --- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment/EnvVariablePeerPodWithBoth_test/Peer_pod_with_environmental_variables_has_been_created (5.01s)
=== RUN   TestLibvirtCreateNginxDeployment
=== RUN   TestLibvirtCreateNginxDeployment/Nginx_image_deployment_test
    nginx_deployment.go:106: Creating nginx deployment...
    nginx_deployment.go:163: Current deployment available replicas: 0
    nginx_deployment.go:163: Current deployment available replicas: 0
    nginx_deployment.go:163: Current deployment available replicas: 0
    nginx_deployment.go:163: Current deployment available replicas: 0
    nginx_deployment.go:163: Current deployment available replicas: 0
    nginx_deployment.go:163: Current deployment available replicas: 0
    nginx_deployment.go:163: Current deployment available replicas: 0
    nginx_deployment.go:163: Current deployment available replicas: 0
    nginx_deployment.go:163: Current deployment available replicas: 0
    nginx_deployment.go:163: Current deployment available replicas: 0
    nginx_deployment.go:163: Current deployment available replicas: 0
    nginx_deployment.go:163: Current deployment available replicas: 0
    nginx_deployment.go:163: Current deployment available replicas: 2
    nginx_deployment.go:111: nginx deployment is available now
=== RUN   TestLibvirtCreateNginxDeployment/Nginx_image_deployment_test/Access_for_nginx_deployment_test
=== NAME  TestLibvirtCreateNginxDeployment/Nginx_image_deployment_test
    nginx_deployment.go:136: Deleting webserver deployment...
    nginx_deployment.go:141: Deleting deployment nginx-deployment...
    nginx_deployment.go:148: Deployment nginx-deployment has been successfully deleted within 120s
--- PASS: TestLibvirtCreateNginxDeployment (135.08s)
    --- PASS: TestLibvirtCreateNginxDeployment/Nginx_image_deployment_test (135.08s)
        --- PASS: TestLibvirtCreateNginxDeployment/Nginx_image_deployment_test/Access_for_nginx_deployment_test (0.03s)
=== RUN   TestLibvirtDeletePod
=== RUN   TestLibvirtDeletePod/DeletePod_test
    assessment_runner.go:264: Waiting for containers in pod: deletion-test are ready
=== RUN   TestLibvirtDeletePod/DeletePod_test/Deletion_complete
=== NAME  TestLibvirtDeletePod/DeletePod_test
    assessment_runner.go:617: Deleting pod deletion-test...
    assessment_runner.go:624: Pod deletion-test has been successfully deleted within 60s
--- PASS: TestLibvirtDeletePod (100.06s)
    --- PASS: TestLibvirtDeletePod/DeletePod_test (100.06s)
        --- PASS: TestLibvirtDeletePod/DeletePod_test/Deletion_complete (0.01s)
=== RUN   TestLibvirtPodToServiceCommunication
=== RUN   TestLibvirtPodToServiceCommunication/TestExtraPods_test
    assessment_runner.go:264: Waiting for containers in pod: test-server are ready
    assessment_runner.go:297: webserver service is available on cluster IP: 10.98.170.137
    assessment_runner.go:301: Provision extra pod test-client
    assessment_helpers.go:393: Waiting for containers in pod: test-client are ready
=== RUN   TestLibvirtPodToServiceCommunication/TestExtraPods_test/Failed_to_test_extra_pod.
    assessment_runner.go:532: Output when execute test commands:command terminated with exit code 1
    assessment_runner.go:534: command terminated with exit code 1
=== NAME  TestLibvirtPodToServiceCommunication/TestExtraPods_test
    assessment_runner.go:617: Deleting pod test-server...
    assessment_runner.go:624: Pod test-server has been successfully deleted within 60s
    assessment_runner.go:630: Deleting pod test-client...
    assessment_runner.go:636: Pod test-client has been successfully deleted within 60s
    assessment_runner.go:652: Deleting Service... nginx-server
--- FAIL: TestLibvirtPodToServiceCommunication (250.30s)
    --- FAIL: TestLibvirtPodToServiceCommunication/TestExtraPods_test (250.30s)
        --- FAIL: TestLibvirtPodToServiceCommunication/TestExtraPods_test/Failed_to_test_extra_pod. (15.10s)
=== RUN   TestLibvirtPodsMTLSCommunication
=== RUN   TestLibvirtPodsMTLSCommunication/TestPodsMTLSCommunication_test
    assessment_runner.go:264: Waiting for containers in pod: mtls-server are ready
    assessment_runner.go:297: webserver service is available on cluster IP: 10.100.133.137
    assessment_runner.go:301: Provision extra pod mtls-client
    assessment_helpers.go:393: Waiting for containers in pod: mtls-client are ready
=== RUN   TestLibvirtPodsMTLSCommunication/TestPodsMTLSCommunication_test/Pods_communication_with_mTLS
    assessment_runner.go:532: Output when execute test commands:command terminated with exit code 6
    assessment_runner.go:534: command terminated with exit code 6
=== NAME  TestLibvirtPodsMTLSCommunication/TestPodsMTLSCommunication_test
    assessment_runner.go:560: Deleting Configmap... nginx-conf
    assessment_runner.go:567: Deleting Secret... server-certs
    assessment_runner.go:586: Deleting extra Secret... curl-certs
    assessment_runner.go:617: Deleting pod mtls-server...
    assessment_runner.go:624: Pod mtls-server has been successfully deleted within 60s
    assessment_runner.go:630: Deleting pod mtls-client...
    assessment_runner.go:636: Pod mtls-client has been successfully deleted within 60s
    assessment_runner.go:652: Deleting Service... nginx-mtls
--- FAIL: TestLibvirtPodsMTLSCommunication (230.41s)
    --- FAIL: TestLibvirtPodsMTLSCommunication/TestPodsMTLSCommunication_test (230.41s)
        --- FAIL: TestLibvirtPodsMTLSCommunication/TestPodsMTLSCommunication_test/Pods_communication_with_mTLS (10.09s)
=== RUN   TestLibvirtKbsKeyRelease
    libvirt_test.go:113: Skipping kbs related test as kbs is not deployed
--- SKIP: TestLibvirtKbsKeyRelease (0.00s)
=== RUN   TestLibvirtRestrictivePolicyBlocksExec
=== RUN   TestLibvirtRestrictivePolicyBlocksExec/PodVMwithPolicyBlockingExec_test
    assessment_runner.go:264: Waiting for containers in pod: policy-exec-rejected are ready
=== RUN   TestLibvirtRestrictivePolicyBlocksExec/PodVMwithPolicyBlockingExec_test/Pod_which_blocks_Exec_Process
=== NAME  TestLibvirtRestrictivePolicyBlocksExec
    common_suite.go:639: Exec process was allowed: Internal error occurred: error executing command in container: cannot enter container fefd98401f1e1ee9a472a2d15be13844dbbd13c7c3e0a691d9159e7a4f701a9c, with err rpc error: code = PermissionDenied desc = "ExecProcessRequest is blocked by policy: ": unknown
=== NAME  TestLibvirtRestrictivePolicyBlocksExec/PodVMwithPolicyBlockingExec_test/Pod_which_blocks_Exec_Process
    assessment_runner.go:435: Output when execute test commands: Internal error occurred: error executing command in container: cannot enter container fefd98401f1e1ee9a472a2d15be13844dbbd13c7c3e0a691d9159e7a4f701a9c, with err rpc error: code = PermissionDenied desc = "ExecProcessRequest is blocked by policy: ": unknown
    assessment_runner.go:437: Command [ls] running in container busybox produced unexpected output on error: Internal error occurred: error executing command in container: cannot enter container fefd98401f1e1ee9a472a2d15be13844dbbd13c7c3e0a691d9159e7a4f701a9c, with err rpc error: code = PermissionDenied desc = "ExecProcessRequest is blocked by policy: ": unknown
=== NAME  TestLibvirtRestrictivePolicyBlocksExec/PodVMwithPolicyBlockingExec_test
    assessment_runner.go:617: Deleting pod policy-exec-rejected...
    assessment_runner.go:624: Pod policy-exec-rejected has been successfully deleted within 60s
--- FAIL: TestLibvirtRestrictivePolicyBlocksExec (105.10s)
    --- FAIL: TestLibvirtRestrictivePolicyBlocksExec/PodVMwithPolicyBlockingExec_test (105.10s)
        --- FAIL: TestLibvirtRestrictivePolicyBlocksExec/PodVMwithPolicyBlockingExec_test/Pod_which_blocks_Exec_Process (5.03s)
=== RUN   TestLibvirtPermissivePolicyAllowsExec
=== RUN   TestLibvirtPermissivePolicyAllowsExec/PodVMwithPermissivePolicy_test
    assessment_runner.go:264: Waiting for containers in pod: policy-all-allowed are ready
=== RUN   TestLibvirtPermissivePolicyAllowsExec/PodVMwithPermissivePolicy_test/Pod_which_allows_all_kata_agent_APIs
    assessment_runner.go:435: Output when execute test commands: 
=== NAME  TestLibvirtPermissivePolicyAllowsExec/PodVMwithPermissivePolicy_test
    assessment_runner.go:617: Deleting pod policy-all-allowed...
    assessment_runner.go:624: Pod policy-all-allowed has been successfully deleted within 60s
--- PASS: TestLibvirtPermissivePolicyAllowsExec (105.14s)
    --- PASS: TestLibvirtPermissivePolicyAllowsExec/PodVMwithPermissivePolicy_test (105.13s)
        --- PASS: TestLibvirtPermissivePolicyAllowsExec/PodVMwithPermissivePolicy_test/Pod_which_allows_all_kata_agent_APIs (5.07s)
=== RUN   TestLibvirtCreatePeerPodWithAuthenticatedImageWithoutCredentials
    libvirt_test.go:152: Authenticated Image Name not exported
--- SKIP: TestLibvirtCreatePeerPodWithAuthenticatedImageWithoutCredentials (0.00s)
=== RUN   TestLibvirtCreatePeerPodWithAuthenticatedImageWithValidCredentials
    libvirt_test.go:161: Registry Credentials, or authenticated image name not exported
--- SKIP: TestLibvirtCreatePeerPodWithAuthenticatedImageWithValidCredentials (0.00s)

Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks okay to me so far. Thanks Wainer!

@wainersm
Copy link
Member Author

wainersm commented Sep 30, 2024

Cc: @littlejawa

Ok, I'm going over the tests that failed, starting with:

--- FAIL: TestLibvirtPodToServiceCommunication (250.30s)
    --- FAIL: TestLibvirtPodToServiceCommunication/TestExtraPods_test (250.30s)
        --- FAIL: TestLibvirtPodToServiceCommunication/TestExtraPods_test/Failed_to_test_extra_pod. (15.10s)

That test fails consistently with CRI-O, but passes with containerd.

The test (implemented here) checks that from within a podvm client it can access a service deployed in another podvm, accessing by the service's name. It sets up nginx on the podvm server, configure a nginx-server service, and finally tries to wget -O- "nginx-server from podvm client that should return nginx's welcome HTML page.

Below is how I can get the error message of wget (as the e2e test framework doesn't give the full messages):

$ kubectl exec test-client -n coco-pp-e2e-test-2777a729 -- wget -O- nginx-
server
wget: bad address 'nginx-server'
command terminated with exit code 1

From test-client it can access the service as long as I pass the cluster IP:

$ kubectl exec test-client -n coco-pp-e2e-test-2777a729 -- wget -O- 10.100.35.174 | grep "Thank you for using nginx"
Connecting to 10.100.35.174 (10.100.35.174:80)
writing to stdout
-                    100% |********************************|   615  0:00:00 ETA
written to stdout
<p><em>Thank you for using nginx.</em></p>

@littlejawa
Copy link

Cc: @littlejawa

Ok, I'm going over the tests that failed, starting with:

--- FAIL: TestLibvirtPodToServiceCommunication (250.30s)
    --- FAIL: TestLibvirtPodToServiceCommunication/TestExtraPods_test (250.30s)
        --- FAIL: TestLibvirtPodToServiceCommunication/TestExtraPods_test/Failed_to_test_extra_pod. (15.10s)

That test fails consistently with CRI-O, but passes with containerd.

[...]

Hey @wainersm,
I've just tried here manually to make sure it's not plain broken, and I could see it work.
This is with OCP 4.16 and OSC 1.7, so definitely not the same test environment, but I wanted to make sure :)

Maybe something to do with the way we install/configure crio or the network plugins. I guess we're using kata-deploy here?
I'll need to build myself an environment similar to what we use in the CI job, so that I can dig further.

@wainersm wainersm requested a review from ldoktor October 1, 2024 14:40
@wainersm
Copy link
Member Author

wainersm commented Oct 3, 2024

Cc: @littlejawa
Ok, I'm going over the tests that failed, starting with:

--- FAIL: TestLibvirtPodToServiceCommunication (250.30s)
    --- FAIL: TestLibvirtPodToServiceCommunication/TestExtraPods_test (250.30s)
        --- FAIL: TestLibvirtPodToServiceCommunication/TestExtraPods_test/Failed_to_test_extra_pod. (15.10s)

That test fails consistently with CRI-O, but passes with containerd.
[...]

Hey @wainersm, I've just tried here manually to make sure it's not plain broken, and I could see it work. This is with OCP 4.16 and OSC 1.7, so definitely not the same test environment, but I wanted to make sure :)

It's okay not same environment; indeed I was worried about it be a problem on OCP + OSC, I'm glad it's not the case :)

Maybe something to do with the way we install/configure crio or the network plugins. I guess we're using kata-deploy here? I'll need to build myself an environment similar to what we use in the CI job, so that I can dig further.

I talked with @littlejawa in pvt; it turns out that the same client/server test fails with cri-o + runc. So definitively it seems like a problem on the configuration of the kcli cluster but we still don't know what...

TestLibvirtPodsMTLSCommunication is also a client/server style test that relies on the service name, so I'm assuming it's the same root cause. Thus, I won't investigate that test further.

I need to figure out how to skip TestLibvirtPodToServiceCommunication and TestLibvirtPodsMTLSCommunication when CRI-O only. That's the plan for now.

Hey, thanks @littlejawa for promptly checking it!

@wainersm
Copy link
Member Author

wainersm commented Oct 3, 2024

I was looking at TestLibvirtCreatePeerPodContainerWithExternalIPAccess fail, it seems same root cause of TestLibvirtPodToServiceCommunication and TestLibvirtPodsMTLSCommunication.

The error is like:

$ kubectl exec pod/busybox-priv -n coco-pp-e2e-test-f7c5cc73 -- ping -c 1 www.google.com
ping: bad address 'www.google.com'
command terminated with exit code 1

@wainersm
Copy link
Member Author

wainersm commented Oct 3, 2024

Updates:

  • New commit to fix TestLibvirtRestrictivePolicyBlocksExec. Now passing.
  • Addressed all comments of @ldoktor
  • As well as @stevenhorsman 's

@wainersm
Copy link
Member Author

wainersm commented Oct 7, 2024

Tested with KBS=true, and TestLibvirtKbsKeyRelease is passing.

@stevenhorsman
Copy link
Member

I've tried this out on a test VM - it doesn't seem to be working on s390x as the cc-operator-controller-manager doesn't start due to an CNI issue:

Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               98s   default-scheduler  Successfully assigned confidential-containers-system/cc-operator-controller-manager-6d6b78b7f5-qc8x2 to peer-pods-worker-0
  Warning  FailedCreatePodSandBox  97s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cc-operator-controller-manager-6d6b78b7f5-qc8x2_confidential-containers-system_44ca7518-3ab7-4b49-ab7e-0dc244787ce8_0(1bffd2367a0a8451eb22612389e914de79c98bbae11d0f9f42a2b65e95349286): error adding pod confidential-containers-system_cc-operator-controller-manager-6d6b78b7f5-qc8x2 to CNI network "cbr0": plugin type="flannel" failed (add): failed to set bridge addr: "cni0" already has an IP address different from 10.244.1.1/24
  Warning  FailedCreatePodSandBox  71s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cc-operator-controller-manager-6d6b78b7f5-qc8x2_confidential-containers-system_44ca7518-3ab7-4b49-ab7e-0dc244787ce8_0(468f1b4438a944e52da37454488396a2d8a7742a59ddd151c19a224f03db558c): error adding pod confidential-containers-system_cc-operator-controller-manager-6d6b78b7f5-qc8x2 to CNI network "cbr0": plugin type="flannel" failed (add): failed to set bridge addr: "cni0" already has an IP address different from 10.244.1.1/24

That might not be a fair test though, so I can try out x86

@stevenhorsman
Copy link
Member

on x86 the TestLibvirtCreatePeerPodContainerWithExternalIPAccess, TestLibvirtPodsMTLSCommunication and TestLibvirtPodToServiceCommunication failed. TestLibvirtKbsKeyRelease/KbsKeyReleasePod_test also did. I'm not sure if it's worth digging into these too much before merging as some are skipped in the CI

@wainersm
Copy link
Member Author

wainersm commented Oct 7, 2024

I've tried this out on a test VM - it doesn't seem to be working on s390x as the cc-operator-controller-manager doesn't start due to an CNI issue:

Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               98s   default-scheduler  Successfully assigned confidential-containers-system/cc-operator-controller-manager-6d6b78b7f5-qc8x2 to peer-pods-worker-0
  Warning  FailedCreatePodSandBox  97s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cc-operator-controller-manager-6d6b78b7f5-qc8x2_confidential-containers-system_44ca7518-3ab7-4b49-ab7e-0dc244787ce8_0(1bffd2367a0a8451eb22612389e914de79c98bbae11d0f9f42a2b65e95349286): error adding pod confidential-containers-system_cc-operator-controller-manager-6d6b78b7f5-qc8x2 to CNI network "cbr0": plugin type="flannel" failed (add): failed to set bridge addr: "cni0" already has an IP address different from 10.244.1.1/24
  Warning  FailedCreatePodSandBox  71s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cc-operator-controller-manager-6d6b78b7f5-qc8x2_confidential-containers-system_44ca7518-3ab7-4b49-ab7e-0dc244787ce8_0(468f1b4438a944e52da37454488396a2d8a7742a59ddd151c19a224f03db558c): error adding pod confidential-containers-system_cc-operator-controller-manager-6d6b78b7f5-qc8x2 to CNI network "cbr0": plugin type="flannel" failed (add): failed to set bridge addr: "cni0" already has an IP address different from 10.244.1.1/24

That might not be a fair test though, so I can try out x86

hmmm... might it be a bug on kcli for s390x?

@wainersm
Copy link
Member Author

wainersm commented Oct 7, 2024

on x86 the TestLibvirtCreatePeerPodContainerWithExternalIPAccess, TestLibvirtPodsMTLSCommunication and TestLibvirtPodToServiceCommunication failed. TestLibvirtKbsKeyRelease/KbsKeyReleasePod_test also did. I'm not sure if it's worth digging into these too much before merging as some are skipped in the CI

As for TestLibvirtKbsKeyRelease, it passed here. First I ran with containerd to ensure I had everything proper set; deleted the cluster then re-ran the test with cri-o. The new job won't fail the containerd counterpart, so maybe we should merge to see who is right by CI? :)

@littlejawa has helped with debugging TestLibvirtCreatePeerPodContainerWithExternalIPAccess, TestLibvirtPodsMTLSCommunication and TestLibvirtPodToServiceCommunication when the time permits. That will take a time and as they are skipped in the CI, I think we can get this merged as is.

Side-note: I was looking for a way to skip these tests when not running on CI (i.e. locally) but I didn't find (yet) a good solution for the fact that from the tests, it cannot access the provision object so it cannot get access to the libvirt.properties, thus, not able to check whether is cri-o or containerd. I thought in query the k8s directly, but ran into another problem which is to get an k8s CLI object from the tests code too. I might be missing something so I plan to have a look at this again later.

@wainersm wainersm removed the hold label Oct 7, 2024
@ldoktor
Copy link
Contributor

ldoktor commented Oct 8, 2024

I finally made it working as well, my problem was a bad podvm image :-) I'm getting failures only for:

  • TestLibvirtCreatePeerPodContainerWithExternalIPAccess
  • TestLibvirtPodToServiceCommunication
  • TestLibvirtPodsMTLSCommunication

the TestLibvirtRestrictivePolicyBlocksExec is passing on my laptop (tested without KBS). I'd suggest creating issues for those and add skip there.

@stevenhorsman
Copy link
Member

Side-note: I was looking for a way to skip these tests when not running on CI (i.e. locally) but I didn't find (yet) a good solution for the fact that from the tests, it cannot access the provision object so it cannot get access to the libvirt.properties, thus, not able to check whether is cri-o or containerd. I thought in query the k8s directly, but ran into another problem which is to get an k8s CLI object from the tests code too. I might be missing something so I plan to have a look at this again later.

Pradipta added an isTestOnCrio() method, so can we just use that and then skip, or is it not working?

@wainersm
Copy link
Member Author

wainersm commented Oct 8, 2024

Hi @ldoktor !

I finally made it working as well, my problem was a bad podvm image :-) I'm getting failures only for:

* TestLibvirtCreatePeerPodContainerWithExternalIPAccess

* TestLibvirtPodToServiceCommunication

* TestLibvirtPodsMTLSCommunication

the TestLibvirtRestrictivePolicyBlocksExec is passing on my laptop (tested without KBS). I'd suggest creating issues for those and add skip there.

Thanks for testing it! I created the issue #2100 as you suggested.

@wainersm
Copy link
Member Author

wainersm commented Oct 8, 2024

Side-note: I was looking for a way to skip these tests when not running on CI (i.e. locally) but I didn't find (yet) a good solution for the fact that from the tests, it cannot access the provision object so it cannot get access to the libvirt.properties, thus, not able to check whether is cri-o or containerd. I thought in query the k8s directly, but ran into another problem which is to get an k8s CLI object from the tests code too. I might be missing something so I plan to have a look at this again later.

Pradipta added an isTestOnCrio() method, so can we just use that and then skip, or is it not working?

hmm... I overlooked that function. Ok, I will use it to disable the tests on local runs. Thanks @stevenhorsman

@wainersm
Copy link
Member Author

wainersm commented Oct 8, 2024

Rebased the code and sent a new commit that disable those failing tests when CRI-O. With that the poor remaining tests are all passing. It's ready to be merged, if not more suggestions.

Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think this is ready to try out now. Thanks @wainersm

Copy link

@littlejawa littlejawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks @wainersm

Copy link
Contributor

@ldoktor ldoktor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good and tracks the failing jobs so let's get this in and see how stable it is.

@stevenhorsman
Copy link
Member

@wainersm - do you think we can rebase this and merge it now, or are you waiting for anything else?

@bpradipt
Copy link
Member

bpradipt commented Nov 5, 2024

pinging @wainersm again ;)

@wainersm
Copy link
Member Author

Hi @stevenhorsman @bpradipt !

I rebased it and resolved the conflicts. One issue arose since last time I tried it: webhook's cert-manager installation has failed.

In https://github.com/wainersm/cc-cloud-api-adaptor/actions/runs/12011685522/job/33484069022 :

2024-11-25T15:05:04.2264631Z time="2024-11-25T15:05:04Z" level=info msg="Error  in install cert-manager: exit status 2: make[1]: Entering directory '/home/runner/work/cc-cloud-api-adaptor/cc-cloud-api-adaptor/src/webhook'\ncurl -fsSL -o cmctl https://github.com/cert-manager/cmctl/releases/latest/download/cmctl_linux_amd64\nchmod +x cmctl\n# Deploy cert-manager\nkubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.15.3/cert-manager.yaml\nnamespace/cert-manager created\ncustomresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created\ncustomresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created\ncustomresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created\ncustomresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created\ncustomresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created\ncustomresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created\nserviceaccount/cert-manager-cainjector created\nserviceaccount/cert-manager created\nserviceaccount/cert-manager-webhook created\nclusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created\nclusterrole.rbac.authorization.k8s.io/cert-manager-cluster-view created\nclusterrole.rbac.authorization.k8s.io/cert-manager-view created\nclusterrole.rbac.authorization.k8s.io/cert-manager-edit created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created\nclusterrole.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created\nrole.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created\nrole.rbac.authorization.k8s.io/cert-manager:leaderelection created\nrole.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created\nrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created\nrolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created\nrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created\nservice/cert-manager created\nservice/cert-manager-webhook created\ndeployment.apps/cert-manager-cainjector created\ndeployment.apps/cert-manager created\ndeployment.apps/cert-manager-webhook created\nmutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created\nvalidatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created\n# Wait for service to be up\nkubectl wait --timeout=90s -n cert-manager endpoints/cert-manager --for=jsonpath='{.subsets[0].addresses[0].ip}'\nendpoints/cert-manager condition met\nkubectl wait --timeout=90s -n cert-manager endpoints/cert-manager-webhook --for=jsonpath='{.subsets[0].addresses[0].ip}'\nendpoints/cert-manager-webhook condition met\n# Wait for few seconds for the cert-manager API to be ready\n# otherwise you'll hit the error \"x509: certificate signed by unknown authority\"\n# Best is to use cmctl - https://cert-manager.io/docs/installation/kubectl/#2-optional-wait-for-cert-manager-webhook-to-be-ready\n./cmctl check api --wait=2m\nError from server (InternalError): Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.cert-manager.svc:443/validate?timeout=30s\": dial tcp 10.97.157.170:443: connect: no route to host\nmake[1]: *** [Makefile:137: deploy-cert-manager] Error 124\nmake[1]: Leaving directory '/home/runner/work/cc-cloud-api-adaptor/cc-cloud-api-adaptor/src/webhook'\n"
2024-11-25T15:05:04.2321654Z F1125 15:05:04.222490   18284 env.go:369] Setup failure: exit status 2

I suspect it's some misconfiguration on kcli with cri-o. I will be reporting to kcli and let you posted.

@wainersm wainersm added the hold label Nov 26, 2024
@wainersm wainersm force-pushed the ci_libvirt_crio branch 2 times, most recently from 4716432 to 4ce0ac0 Compare November 27, 2024 14:15
@wainersm
Copy link
Member Author

Hi @stevenhorsman @bpradipt !

I rebased it and resolved the conflicts. One issue arose since last time I tried it: webhook's cert-manager installation has failed.

In https://github.com/wainersm/cc-cloud-api-adaptor/actions/runs/12011685522/job/33484069022 :

2024-11-25T15:05:04.2264631Z time="2024-11-25T15:05:04Z" level=info msg="Error  in install cert-manager: exit status 2: make[1]: Entering directory '/home/runner/work/cc-cloud-api-adaptor/cc-cloud-api-adaptor/src/webhook'\ncurl -fsSL -o cmctl https://github.com/cert-manager/cmctl/releases/latest/download/cmctl_linux_amd64\nchmod +x cmctl\n# Deploy cert-manager\nkubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.15.3/cert-manager.yaml\nnamespace/cert-manager created\ncustomresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created\ncustomresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created\ncustomresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created\ncustomresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created\ncustomresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created\ncustomresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created\nserviceaccount/cert-manager-cainjector created\nserviceaccount/cert-manager created\nserviceaccount/cert-manager-webhook created\nclusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created\nclusterrole.rbac.authorization.k8s.io/cert-manager-cluster-view created\nclusterrole.rbac.authorization.k8s.io/cert-manager-view created\nclusterrole.rbac.authorization.k8s.io/cert-manager-edit created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created\nclusterrole.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created\nrole.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created\nrole.rbac.authorization.k8s.io/cert-manager:leaderelection created\nrole.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created\nrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created\nrolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created\nrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created\nservice/cert-manager created\nservice/cert-manager-webhook created\ndeployment.apps/cert-manager-cainjector created\ndeployment.apps/cert-manager created\ndeployment.apps/cert-manager-webhook created\nmutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created\nvalidatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created\n# Wait for service to be up\nkubectl wait --timeout=90s -n cert-manager endpoints/cert-manager --for=jsonpath='{.subsets[0].addresses[0].ip}'\nendpoints/cert-manager condition met\nkubectl wait --timeout=90s -n cert-manager endpoints/cert-manager-webhook --for=jsonpath='{.subsets[0].addresses[0].ip}'\nendpoints/cert-manager-webhook condition met\n# Wait for few seconds for the cert-manager API to be ready\n# otherwise you'll hit the error \"x509: certificate signed by unknown authority\"\n# Best is to use cmctl - https://cert-manager.io/docs/installation/kubectl/#2-optional-wait-for-cert-manager-webhook-to-be-ready\n./cmctl check api --wait=2m\nError from server (InternalError): Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.cert-manager.svc:443/validate?timeout=30s\": dial tcp 10.97.157.170:443: connect: no route to host\nmake[1]: *** [Makefile:137: deploy-cert-manager] Error 124\nmake[1]: Leaving directory '/home/runner/work/cc-cloud-api-adaptor/cc-cloud-api-adaptor/src/webhook'\n"
2024-11-25T15:05:04.2321654Z F1125 15:05:04.222490   18284 env.go:369] Setup failure: exit status 2

I suspect it's some misconfiguration on kcli with cri-o. I will be reporting to kcli and let you posted.

ok, problem was reported in karmab/kcli#744

I figured flannel+crio seems to lack some configuration for networking services, I don't know exactly what's the problem. I tried calico then cilium; the later worked out (i.e. cert-manager installed successfully). Hence, I added a conditional to kcli_cluster.sh to instruct kcli to configure the cluster with cilium if cri-o.

I'v run a couple of times before end up with a green execution. For the records:

=== RUN   TestLibvirtImageDecryption/TestImageDecryption_test
    assessment_runner.go:274: timed out waiting for the condition
--- FAIL: TestLibvirtImageDecryption (600.27s)
    --- FAIL: TestLibvirtImageDecryption/TestImageDecryption_test (600.03s)
=== RUN   TestLibvirtCreatePeerPodWithJob/JobPeerPod_test/Job_has_been_created
--- FAIL: TestLibvirtCreatePeerPodWithJob (600.02s)
    --- FAIL: TestLibvirtCreatePeerPodWithJob/JobPeerPod_test (600.02s)
        --- FAIL: TestLibvirtCreatePeerPodWithJob/JobPeerPod_test/Job_has_been_created (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x24900f2]
=== RUN   TestLibvirtImageDecryption/TestImageDecryption_test
    assessment_runner.go:274: timed out waiting for the condition
--- FAIL: TestLibvirtImageDecryption (600.27s)
    --- FAIL: TestLibvirtImageDecryption/TestImageDecryption_test (600.03s)

@wainersm
Copy link
Member Author

I think it's now ready to be merged if the approvals stand. But I will keep it on 'hold' until the 0.11 release is out.

@wainersm
Copy link
Member Author

wainersm commented Dec 2, 2024

I think it's now ready to be merged if the approvals stand. But I will keep it on 'hold' until the 0.11 release is out.

I will also wait on #2019 be merged first. Then I can resolve any conflict.

In order to use kcli to create a k8s cluster with configured with
cri-o, it will be needed to use a version newer than 07/02/2024 which
containers the karmab/kcli@77cf2cb
fix. So picking the latest version available at the time of this commit.

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
By exporting the CONTAINER_RUNTIME=crio variable, kcli will create
a k8s cluster configured CRI-O:

$ export CONTAINER_RUNTIME=crio
$ ./src/cloud-api-adaptor/libvirt/kcli_cluster.sh create

Fixes confidential-containers#1981
Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Commit a0247ae introduced a new parameter (CONTAINER_RUNTIME) for docker
provider, allowing users to specify the container runtime used. Some tests will
take decisions based on that property, for example, whether nydus snapshotter
messages should be inspected or not. Likewise, this added the handler for that
property for libvirt, so allowing to test with cri-o too.

Fixes confidential-containers#1981
Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Added a new container_runtime matrix column to generate
one job for each runtime: containerd and crio.

Fixes confidential-containers#1981
Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Let's keep it running for a while on CI, once it's stable we can
remove the continue-on-error.

Fixes confidential-containers#1981
Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
The DoTestRestrictivePolicyBlocksExec test for CRI-O will have the
"error executing command in container" error message instead of
"failed to exec in container". So adjusted the expected strings on
the error message to consider the output of CRI-O too.

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Some tests are already skipped on CI, also disabled them when running
locally because they fail as well.

The TestLibvirtImageDecryption test has failed on CI because it is not
supported with CRI-O.

Related-to: confidential-containers#2100
Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
@wainersm
Copy link
Member Author

wainersm commented Dec 4, 2024

Rebased & resolved conflicts only. I tested on my fork in https://github.com/wainersm/cc-cloud-api-adaptor/actions/runs/12166242896/job/33932711190 (PASS). It is using the packer built podvm image, at some point we should switch to mkosi as discussed on the meeting today.

@wainersm wainersm removed the hold label Dec 5, 2024
@wainersm wainersm merged commit 4403217 into confidential-containers:main Dec 5, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Issues related to CI workflows e2e-test provider/libvirt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libvirt: run CI for k8s configured with CRI-O
5 participants