Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi openstack cloud support with OCCM #2183

Closed
MatthieuFin opened this issue Sep 27, 2024 · 2 comments · Fixed by #2193
Closed

Multi openstack cloud support with OCCM #2183

MatthieuFin opened this issue Sep 27, 2024 · 2 comments · Fixed by #2193
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@MatthieuFin
Copy link
Contributor

/kind feature

Glossary

  • cloud: I will use "cloud" term to identify an independent openstack cluster, 2 different clouds means 2 different openstack clusters with differents endpoints, credentials, regoin name, keystore ...

Description

I would like to be able to create k8s cluster spread across different openstack clusters. I have already underlying network based on VPN tunnels and network interconnections, ready to VMs communications between different clouds.

I am able to create VM in my different clouds with the help of spec.template.spec.identityRef from OpenStackMachineTemplate which permit to override idetityRef define in parent OpenStackCluster object.

It works well ! My issue concern integration with OCCM (Openstack Cloud Controller Manager).

The only way yo manage multiple clouds with OCCM seems to be to deploy 1 OCCM instance per cloud by setting env variables OS_CCM_REGIONAL="true" and OS_V1_INSTANCES="true".

In this way each OCCM manage his own cloud by adding OS region name (a limitation could be that 2 differents clouds should have different regions names but I don't think that is really an issue). The "feature flag" implementation is here.
OCCM with this env variables manage VMs with providerId format like openstack://region_name/uuid
k8s nodes spec.providerID is immutable and created by kubeadm in the usecase of CAPO, so I configure my VMs with following configuration :

---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: pool-region-one
  annotations:
    controlplane.cluster.x-k8s.io/skip-coredns: ""
    controlplane.cluster.x-k8s.io/skip-kube-proxy: ""
spec:
  template:
    spec:
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            cloud-provider: external
            provider-id: openstack://region-one/'{{ instance_id }}'
...

In that way k8s node are properly provision with the right providerID anc correctly recognize by the right OCCM.

Missing feature

The issue is now on CAPO implementation which hardcode providerID in format openstack:///uuid in his machine crd object with no possibility to configure endoint field with region name

apiVersion: cluster.x-k8s.io/v1beta1
kind: Machine
spec:
  providerID: openstack:///uuid
  ...

I explain this scenario in an issue concerning usecase without kubeadm and unset providerID with usage of multiple OCCM in same k8s cluster kubernetes/cloud-provider-openstack#2590 (comment) ,
TLDR: CAPI see k8s node with providerID openstack://region_name/uuid but CAPO see machine with providerID openstack:///uuid (with same uuid) and it is stuck in this state.

Code to patch

I think that CAPO machine crd providerID is hardcoded here but I don't find proper way to get region name here, to template this instruction.

How to patch it ?

  • I can retreive identityRef fields so technically I can read k8s secret and parse OS config to get region name from configuration.
  • Add an optional field in OpenStackMachineSpec "region_name" which will be added in providerID if it is fill.
  • I didn't find better way yet... Maybe @mdbooth or someone could give me a good hint 🤞
@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 27, 2024
@EmilienM
Copy link
Contributor

Beside the fact that your proposed solution is a huge workaround (I almost wrote hack), I like the idea of reading the secret and take the region from there.

@EmilienM
Copy link
Contributor

@mdbooth I think I found a related / similar issue with Hypershift & CAPO:

I1010 16:22:26.292702       1 event.go:389] "Event occurred" object="openshift-ingress/router-default" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="UpdateLoadBalancerFailed" message="Error updating load balancer with new hosts [example-d49h2-d5ctv-qgvz6 example-d49h2-test-inplaceupgrade-8krpf example-d49h2-test-machineconfig-6kdqr-gnsm8 example-d49h2-test-machineconfig-k2227-4jkgl example-d49h2-test-replaceupgrade-pb2hn-ttslc] [node names limited, total number of nodes: 5], error: failed to update Security Group for loadbalancer service openshift-ingress/router-default: error getting server ID from the node: ProviderID \"\" didn't match expected format \"openstack://region/InstanceID\""

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_hypershift/4868/pull-ci-openshift-hypershift-main-e2e-openstack-nodepools/1844394303221141504/artifacts/e2e-openstack-nodepools/hypershift-openstack-e2e-execute/artifacts/TestNodePool_HostedCluster0/namespaces/e2e-clusters-n7d44-example-d49h2/core/pods/logs/openstack-cloud-controller-manager-8d9cf5788-jkrfl-cloud-controller-manager.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
Archived in project
3 participants