Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file /opt/confidential-containers/bin/qemu-system-x86_64 does not exist: not found #293

Open
tswangdi opened this issue Nov 27, 2023 · 4 comments

Comments

@tswangdi
Copy link

I have installed confidential-container and gpu-operator following https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-kata.html.

When I create a pod using runtimeClass kata-qemu-nvidia-gpu:

Events:
  Type     Reason                  Age                From               Message
  Warning  FailedCreatePodSandBox  0s (x14 over 13s)  kubelet            Failed to create pod sandbox: rpc error: code = NotFound desc = failed to create containerd task: failed to create shim task: /opt/nvidia-gpu-operator/artifacts/runtimeclasses/kata-qemu-nvidia-gpu/configuration-kata-qemu-nvidia-gpu.toml: file /opt/confidential-containers/bin/qemu-system-x86_64 does not exist: not found

runtime list:

% k get runtimeclass
NAME                       HANDLER                    AGE
kata                       kata-qemu                  79m
kata-clh                   kata-clh                   79m
kata-clh-tdx               kata-clh-tdx               79m
kata-qemu                  kata-qemu                  79m
kata-qemu-nvidia-gpu       kata-qemu-nvidia-gpu       65m
kata-qemu-nvidia-gpu-snp   kata-qemu-nvidia-gpu-snp   65m
kata-qemu-sev              kata-qemu-sev              79m
kata-qemu-snp              kata-qemu-snp              79m
kata-qemu-tdx              kata-qemu-tdx              79m
nvidia                     nvidia                     66m

confidential-container ns:

% k get pods -n confidential-containers-system
NAME                                             READY   STATUS    RESTARTS   AGE
cc-operator-controller-manager-64967bfbc-b777h   2/2     Running   0          55m
cc-operator-daemon-install-7hzbp                 1/1     Running   0          54m
cc-operator-pre-install-daemon-m5gwq             1/1     Running   0          55m

cc-operator-daemon-install log:

% k logs -n confidential-containers-system cc-operator-daemon-install-7hzbp
nvidia driver modules are not yet loaded, invoking runc directly
Environment variables passed to this script
* NODE_NAME: sm02
* DEBUG: true
* SHIMS: clh clh-tdx qemu qemu-tdx qemu-sev qemu-snp
* DEFAULT_SHIM: qemu
* CREATE_RUNTIMECLASSES: true
* CREATE_DEFAULT_RUNTIMECLASS: true
* SNAPSHOTTER: nydus
copying kata artifacts onto host
Creating the runtime classes
Creating the kata-clh runtime class
runtimeclass.node.k8s.io/kata-clh created
Creating the kata-clh-tdx runtime class
runtimeclass.node.k8s.io/kata-clh-tdx created
Creating the kata-qemu runtime class
runtimeclass.node.k8s.io/kata-qemu created
Creating the kata-qemu-tdx runtime class
runtimeclass.node.k8s.io/kata-qemu-tdx created
Creating the kata-qemu-sev runtime class
runtimeclass.node.k8s.io/kata-qemu-sev created
Creating the kata-qemu-snp runtime class
runtimeclass.node.k8s.io/kata-qemu-snp created
Creating the kata runtime class for the default shim (an alias for kata-qemu)
runtimeclass.node.k8s.io/kata created
warning: /usr/local/bin/containerd-shim-kata-clh-v2 already exists
warning: /usr/local/bin/containerd-shim-kata-clh-tdx-v2 already exists
warning: /usr/local/bin/containerd-shim-kata-qemu-v2 already exists
warning: /usr/local/bin/containerd-shim-kata-v2 already exists
Creating the default shim-v2 binary
warning: /usr/local/bin/containerd-shim-kata-qemu-tdx-v2 already exists
warning: /usr/local/bin/containerd-shim-kata-qemu-sev-v2 already exists
warning: /usr/local/bin/containerd-shim-kata-qemu-snp-v2 already exists
Add Kata Containers as a supported runtime for containerd
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata.options, overwriting
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-clh]
  runtime_type = "io.containerd.kata-clh.v2"
  snapshotter = "nydus"
  privileged_without_host_devices = true
  pod_annotations = ["io.katacontainers.*"]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-clh.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-clh.toml"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-clh-tdx]
  runtime_type = "io.containerd.kata-clh-tdx.v2"
  snapshotter = "nydus"
  privileged_without_host_devices = true
  pod_annotations = ["io.katacontainers.*"]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-clh-tdx.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-clh-tdx.toml"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu]
  runtime_type = "io.containerd.kata-qemu.v2"
  snapshotter = "nydus"
  privileged_without_host_devices = true
  pod_annotations = ["io.katacontainers.*"]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-qemu.toml"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-tdx]
  runtime_type = "io.containerd.kata-qemu-tdx.v2"
  snapshotter = "nydus"
  privileged_without_host_devices = true
  pod_annotations = ["io.katacontainers.*"]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-tdx.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-qemu-tdx.toml"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-sev]
  runtime_type = "io.containerd.kata-qemu-sev.v2"
  snapshotter = "nydus"
  privileged_without_host_devices = true
  pod_annotations = ["io.katacontainers.*"]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-sev.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-qemu-sev.toml"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-snp]
  runtime_type = "io.containerd.kata-qemu-snp.v2"
  snapshotter = "nydus"
  privileged_without_host_devices = true
  pod_annotations = ["io.katacontainers.*"]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-snp.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-qemu-snp.toml"
node/sm02 labeled

gpu-operator ns:

% k get pods -n gpu-operator -o wide | grep sm02
gpu-operator-1701061580-node-feature-discovery-gc-57ccf778tsg95   1/1     Running       0          36m   10.233.92.141    sm02    <none>           <none>
gpu-operator-1701061580-node-feature-discovery-worker-9mtj9       1/1     Running       0          43m   10.233.92.132    sm02    <none>           <none>
nvidia-kata-manager-bgpfq                                         1/1     Running       0          42m   10.233.92.131    sm02    <none>           <none>
nvidia-sandbox-device-plugin-daemonset-fmmtd                      1/1     Running       0          40m   10.233.92.172    sm02    <none>           <none>
nvidia-sandbox-validator-8pmwr                                    1/1     Running       0          42m   10.233.92.129    sm02    <none>           <none>
nvidia-vfio-manager-ldr69                                         1/1     Running       0          42m   10.233.92.178    sm02    <none>           <none>
sm02:/$ ls /opt/confidential-containers/bin/
containerd-nydus-grpc  nydus-overlayfs
@zvonkok
Copy link
Member

zvonkok commented Nov 29, 2023

/cc @zvonkok

@zvonkok
Copy link
Member

zvonkok commented Nov 29, 2023

Can you show us all the pods? Is the confidential-containers operator running?

@zvonkok
Copy link
Member

zvonkok commented Nov 29, 2023

If yes which version did you deploy? v0.7.0 ?

@tswangdi
Copy link
Author

tswangdi commented Nov 30, 2023

Thank you for response.

All pods in confidential-containers-system:

% k get pods -n confidential-containers-system
NAME                                             READY   STATUS    RESTARTS   AGE
cc-operator-controller-manager-64967bfbc-gc58b   2/2     Running   0          21h
cc-operator-daemon-install-l8nj4                 1/1     Running   0          19h
cc-operator-pre-install-daemon-5jskh             1/1     Running   0          19h

Version: v0.8.0

cc-operator-controller-manager-64967bfbc-gc58b log:

2023-11-30T02:30:56Z	INFO	Reconciling CcRuntime in Kubernetes Cluster	{"controller": "ccruntime", "controllerGroup": "confidentialcontainers.org", "controllerKind": "CcRuntime", "CcRuntime": {"name":""}, "namespace": "", "name": "", "reconcileID": "d2e1c0a3-a87f-4d4a-a09e-5d30dd4352df"}
2023-11-30T02:30:56Z	INFO	Reconciling CcRuntime in Kubernetes Cluster	{"controller": "ccruntime", "controllerGroup": "confidentialcontainers.org", "controllerKind": "CcRuntime", "CcRuntime": {"name":"ccruntime-sample"}, "namespace": "", "name": "ccruntime-sample", "reconcileID": "1cd39791-5fcf-46d3-a9eb-2d454e7dbaaa"}
2023-11-30T02:30:58Z	INFO	Reconciling CcRuntime in Kubernetes Cluster	{"controller": "ccruntime", "controllerGroup": "confidentialcontainers.org", "controllerKind": "CcRuntime", "CcRuntime": {"name":""}, "namespace": "", "name": "", "reconcileID": "b75b7456-a3f7-4ea9-9a33-d333fc447da2"}
2023-11-30T02:30:58Z	INFO	Reconciling CcRuntime in Kubernetes Cluster	{"controller": "ccruntime", "controllerGroup": "confidentialcontainers.org", "controllerKind": "CcRuntime", "CcRuntime": {"name":"ccruntime-sample"}, "namespace": "", "name": "ccruntime-sample", "reconcileID": "3d32dff4-f83e-48a6-876b-dd8739a27a03"}
2023-11-30T02:30:59Z	INFO	Reconciling CcRuntime in Kubernetes Cluster	{"controller": "ccruntime", "controllerGroup": "confidentialcontainers.org", "controllerKind": "CcRuntime", "CcRuntime": {"name":""}, "namespace": "", "name": "", "reconcileID": "04b7ae88-3586-4a49-bd8e-8684e2cf88ba"}
2023-11-30T02:30:59Z	INFO	Reconciling CcRuntime in Kubernetes Cluster	{"controller": "ccruntime", "controllerGroup": "confidentialcontainers.org", "controllerKind": "CcRuntime", "CcRuntime": {"name":"ccruntime-sample"}, "namespace": "", "name": "ccruntime-sample", "reconcileID": "bb135b72-b108-42a5-8fa7-5d755fa380c8"}

cc-operator-pre-install-daemon-5jskh log:

nvidia driver modules are not yet loaded, invoking runc directly
INSTALL_COCO_CONTAINERD: false
INSTALL_OFFICIAL_CONTAINERD: false
INSTALL_VFIO_GPU_CONTAINERD: false
INSTALL_NYDUS_SNAPSHOTTER: true
Copying nydus-snapshotter artifacts onto host
Created symlink /etc/systemd/system/containerd.service.requires/nydus-snapshotter.service → /etc/systemd/system/nydus-snapshotter.service.
configure nydus snapshotter for containerd
Create /etc/containerd/config.toml.d
Drop-in the nydus configuration
[proxy_plugins]
  [proxy_plugins.nydus]
    type = "snapshot"
    address = "/run/containerd-nydus/containerd-nydus-grpc.sock"
Restarting containerd
Restarting containerd
node/sm02 labeled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants