Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TargetAllocator missing scrape configs from ServiceMonitor/PodMonitor when using v1beta1 spec vs v1alpha1 #3528

Closed
jlcrow opened this issue Dec 9, 2024 · 6 comments
Labels
area:target-allocator Issues for target-allocator question Further information is requested

Comments

@jlcrow
Copy link

jlcrow commented Dec 9, 2024

Component(s)

target allocator

What happened?

Description

Originally deployed a v1beta1 spec and the /scrape_configs endpoint of the TargetAllocator is missing all ServiceMonitor and PodMonitor definitions.

Steps to Reproduce

  1. Set up a new Operator using the helm chart
helm upgrade --install opentelemetry-operator open-telemetry/opentelemetry-operator --atomic --timeout 1800s --version 0.75.0 --create-namespace \
  -n otel-operator --set manager.collectorImage.repository=otel/opentelemetry-collector-contrib --set manager.createRbacPermissions=true
  1. Deploy RBAC and Collector Spec with Target Allocator
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-targetallocator-role
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  - configmaps
  - namespaces
  verbs: ["get", "list", "watch"]
- apiGroups:
  - discovery.k8s.io
  resources:
  - endpointslices
  verbs: ["get", "list", "watch"]
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
- apiGroups:
  - monitoring.coreos.com
  resources:
  - probes
  - servicemonitors
  - podmonitors
  - scrapeconfigs
  verbs: ['*']
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-targetallocator-role-binding
subjects:
- kind: ServiceAccount
  name: otel-targetallocator
  namespace: monitoring
roleRef:
  kind: ClusterRole
  name: otel-targetallocator-role
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-prometheus-collector-role
rules:
  - apiGroups:
    - ""
    resources:
    - configmaps
    - ingresses
    - events
    - namespaces
    - endpoints
    - namespaces/status
    - nodes
    - nodes/spec
    - nodes/stats
    - nodes/proxy
    - nodes/metrics      
    - pods
    - pods/status
    - persistentvolumeclaims
    - persistentvolumes
    - replicationcontrollers
    - replicationcontrollers/status
    - resourcequotas
    - services
    verbs:
    - get
    - list
    - watch
  - apiGroups:
    - apps
    resources:
    - daemonsets
    - deployments
    - replicasets
    - statefulsets
    verbs:
    - get
    - list
    - watch
  - apiGroups:
    - extensions
    resources:
    - daemonsets
    - deployments
    - replicasets
    - ingresses/status
    - ingresses      
    verbs:
    - get
    - list
    - watch
  - apiGroups:
    - batch
    resources:
    - jobs
    - cronjobs
    verbs:
    - get
    - list
    - watch
  - apiGroups:
      - autoscaling
    resources:
      - horizontalpodautoscalers
    verbs:
      - get
      - list
      - watch
  - nonResourceURLs:
    - /metrics
    verbs:
    - get
    - list
    - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-prometheus-collector-cr-binding
subjects:
- kind: ServiceAccount
  name: otel-collector
  namespace: monitoring
roleRef:
  kind: ClusterRole
  name: otel-prometheus-collector-role
  apiGroup: rbac.authorization.k8s.io

Collector v1beta1

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otel
  namespace: monitoring
spec:
  config:
    processors:
      batch: {}
      tail_sampling:
        policies:
          - name: drop_noisy_traces_url
            type: string_attribute
            string_attribute:
              key: http.url
              values:
                - \/metrics
                - \/health
                - \/livez
                - \/readyz
                - \/prometheus
                - \/actuator*
                - opentelemetry\.proto
                - favicon\.ico              
              enabled_regex_matching: true
              invert_match: true
      k8sattributes:
        extract:
          annotations:
          - from: pod
            key: splunk.com/sourcetype
          - from: namespace
            key: splunk.com/exclude
            tag_name: splunk.com/exclude
          - from: pod
            key: splunk.com/exclude
            tag_name: splunk.com/exclude
          - from: namespace
            key: splunk.com/index
            tag_name: com.splunk.index
          - from: pod
            key: splunk.com/index
            tag_name: com.splunk.index
          labels:
          - key: app
          metadata:
          - k8s.namespace.name
          - k8s.node.name
          - k8s.pod.name
          - k8s.pod.uid
          - container.id
          - container.image.name
          - container.image.tag
        filter:
          node_from_env_var: K8S_NODE_NAME    
        pod_association:
        - sources:
          - from: resource_attribute
            name: k8s.pod.uid
        - sources:
          - from: resource_attribute
            name: k8s.pod.ip
        - sources:
          - from: resource_attribute
            name: ip
        - sources:
          - from: connection
        - sources:
          - from: resource_attribute
            name: host.name                   
      memory_limiter: 
        check_interval: 5s
        limit_percentage: 90
      resource:
        attributes:
        - action: upsert
          key: gke_cluster
          value: ${CLUSTER_NAME}
        - action: upsert
          key: cluster_name
          value: staging-digital
        - key: cluster
          value: ${CLUSTER_NAME}
          action: upsert        
      resourcedetection:
        detectors:
        - env
        - gcp
        - system
        override: true
        timeout: 10s        
    extensions:
      health_check:
        endpoint: ${MY_POD_IP}:13133
      k8s_observer:
        auth_type: serviceAccount
        node: ${K8S_NODE_NAME}        
    receivers:
      prometheus:
        config:
          global:
            evaluation_interval: 15s
            scrape_interval: 30s
            scrape_timeout: 10s
          scrape_configs:
          - job_name: kubernetes-apiservers
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:           
            - action: keep
              regex: default;kubernetes;https
              source_labels:
              - __meta_kubernetes_namespace
              - __meta_kubernetes_service_name
              - __meta_kubernetes_endpoint_port_name
            metric_relabel_configs:          
            - source_labels: [__name__]
              regex: up|go_gc_duration_seconds|go_gc_duration_seconds_count|go_goroutines|go_memstats_alloc_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_sys_bytes|go_memstats_stack_inuse_bytes|process_cpu_seconds_total|process_max_fds|process_open_fds|process_resident_memory_bytes|process_start_time_seconds|process_virtual_memory_bytes
              action: keep
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true
          - job_name: kubernetes-nodes
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            kubernetes_sd_configs:
            - role: node
            relabel_configs:
            - action: labelmap
              regex: __meta_kubernetes_node_label_(.+)
            - replacement: kubernetes.default.svc:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$$1/proxy/metrics
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            metric_relabel_configs:         
            - source_labels: [__name__]
              regex: up|go_gc_duration_seconds|go_gc_duration_seconds_count|go_goroutines|go_memstats_alloc_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_sys_bytes|go_memstats_stack_inuse_bytes|kubelet_volume_stats_available_bytes|kubelet_volume_stats_capacity_bytes|process_cpu_seconds_total|process_max_fds|process_open_fds|process_resident_memory_bytes|process_start_time_seconds|process_virtual_memory_bytes
              action: keep
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true
          - job_name: kubernetes-pods
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:               
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scrape
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: pod
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: node
            - action: drop
              regex: Pending|Succeeded|Failed
              source_labels:
              - __meta_kubernetes_pod_phase
            metric_relabel_configs:         
            - source_labels: [__name__]
              regex: up|certmanager_certificate_expiration_timestamp_seconds|cortex_alertmanager_notifications_failed_total|cortex_alertmanager_notifications_total|cortex_bucket_blocks_count|cortex_prometheus_notifications_errors_total|cortex_prometheus_notifications_sent_total|cortex_prometheus_rule_evaluation_duration_seconds|cortex_prometheus_rule_evaluation_duration_seconds_count|cortex_prometheus_rule_evaluation_duration_seconds_sum|cortex_prometheus_rule_evaluation_failures_total|cortex_prometheus_rule_evaluations_total|cortex_prometheus_rule_group_iterations_missed_total|cortex_request_duration_seconds_bucket|cortex_request_duration_seconds_count|cortex_request_duration_seconds_sum|db_pool_free|db_pool_max|db_pool_min|db_pool_pending_acquires|db_pool_pending_creates|db_pool_used|db_query_duration_seconds|db_query_duration_seconds_count|environments_total|envoy_cluster_upstream_cx_active|envoy_cluster_upstream_cx_rx_bytes_total|envoy_cluster_upstream_cx_total|envoy_cluster_upstream_cx_tx_bytes_total|envoy_cluster_upstream_rq_retry_success_total|envoy_cluster_upstream_rq_retry_total|exim_panic_total|exim_queue|exim_reject_total|feature_toggle_update_total|feature_toggle_usage_total|feature_toggles_total|gauge_memberlist_health_score|go_gc_duration_seconds|go_gc_duration_seconds_count|go_goroutines|go_memstats_alloc_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_sys_bytes|go_memstats_stack_inuse_bytes|http_request_duration_milliseconds|http_request_duration_milliseconds_count|istio_request_bytes_bucket|istio_request_duration_milliseconds_bucket|istio_request_duration_milliseconds_sum|istio_requests_total|istio_response_bytes_bucket|istio_tcp_received_bytes_total|istio_tcp_sent_bytes_total|memberlist_client_cluster_members_count|memberlist_client_cluster_node_health_score|memberlist_client_kv_store_count|node_cpu_seconds_total|node_filesystem_free_bytes|node_filesystem_size_bytes|node_memory_Buffers_bytes|node_memory_Cached_bytes|node_memory_MemAvailable_bytes|node_memory_MemFree_bytes|node_memory_MemTotal_bytes|nodejs_external_memory_bytes|nodejs_heap_size_total_bytes|nodejs_heap_size_used_bytes|nodejs_version_info|pilot_proxy_convergence_time_bucket|process_cpu_seconds_total|process_cpu_system_seconds_total|process_cpu_user_seconds_total|process_heap_bytes|process_max_fds|process_open_fds|process_resident_memory_bytes|process_start_time_seconds|process_virtual_memory_bytes|production_changes_30|production_changes_60|production_changes_90|projects_total|sidecar_injection_success_total|strategies_total|tempo_distributor_queue_length|tempo_distributor_spans_received_total|tempo_ingester_blocks_cleared_total|tempo_ingester_blocks_flushed_total|tempo_ingester_failed_flushes_total|tempo_ingester_flush_duration_seconds_bucket|tempo_ingester_traces_created_total|tempo_memberlist_client_cluster_members_count|tempo_memberlist_client_cluster_node_health_score|tempo_memberlist_client_kv_store_count|tempo_memcache_request_duration_seconds_bucket|tempo_memcache_request_duration_seconds_count|tempo_metrics_generator_spans_discarded_total|tempo_metrics_generator_spans_received_total|tempo_receiver_accepted_spans|tempo_receiver_refused_spans|tempo_request_duration_seconds_bucket|tempo_request_duration_seconds_count|tempodb_backend_request_duration_seconds_bucket|tempodb_backend_request_duration_seconds_count|tempodb_blocklist_length|tempodb_blocklist_poll_duration_seconds_bucket|tempodb_compaction_blocks_total|tempodb_compaction_bytes_written_total|tempodb_compaction_errors_total|tempodb_compaction_objects_combined_total|tempodb_compaction_objects_written_total|tempodb_retention_deleted_total|tempodb_retention_duration_seconds_bucket|tempodb_retention_errors_total|tempodb_retention_marked_for_deletion_total|tempodb_work_queue_length|tempodb_work_queue_max|thanos_objstore_bucket_operation_failures_total|thanos_objstore_bucket_operations_total|users_active_30|users_active_60|users_active_7|users_active_90|users_total
              action: keep
            - action: labeldrop
              regex: chart|heritage|.*operator.*|release
            scrape_interval: 30s
            scrape_timeout: 5s      
          - job_name: kube-state-metrics
            kubernetes_sd_configs:
            - role: endpoints
              selectors:
              - role: endpoints
                label: "app.kubernetes.io/name=kube-state-metrics" 
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scrape
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_service_annotation_prometheus_io_port
              target_label: __address__
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: exporter_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: node
            metric_relabel_configs:         
            - source_labels: [__name__]
              regex: up|kube_horizontalpodautoscaler_status_current_replicas|kube_node_info|kube_node_status_allocatable|kube_node_status_capacity|kube_pod_container_info|kube_pod_container_resource_limits|kube_pod_container_resource_requests|kube_pod_container_status_ready|kube_pod_container_status_restarts_total|kube_pod_container_status_running|kube_pod_created|kube_pod_owner|kube_pod_start_time|kube_pod_status_ready_time
              action: keep
            - regex: exporter_namespace
              action: labeldrop
          - job_name: kubernetes-nodes-cadvisor
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            honor_timestamps: true
            kubernetes_sd_configs:
            - role: node
            relabel_configs:
            - action: labelmap
              regex: __meta_kubernetes_node_label_(.+)
            - replacement: kubernetes.default.svc:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$$1/proxy/metrics/cadvisor
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true
            metric_relabel_configs:           
            - source_labels: [__name__]
              regex: up|container_cpu_system_seconds_total|container_cpu_usage_seconds_total|container_cpu_user_seconds_total|container_fs_usage_bytes|container_last_seen|container_memory_usage_bytes|container_memory_working_set_bytes|container_network_receive_bytes_total|container_network_receive_packets_dropped_total|container_network_receive_packets_total|container_network_transmit_bytes_total|container_network_transmit_packets_dropped_total|container_network_transmit_packets_total|container_spec_cpu_period|container_spec_cpu_quota|machine_cpu_cores|machine_memory_bytes
              action: keep
          - job_name: 'sonarqube'
            basic_auth:
              username: 5ad44f407af2e14108d1de66eb9369e80a8ae3f5
              password: ""
            metrics_path: '/sonarqube/api/prometheus/metrics'
            static_configs:
              - targets:
                  - sonarqube.quality:9000  
      otlp:
        protocols:
          grpc:
            endpoint: ${MY_POD_IP}:4317
            keepalive:
              enforcement_policy:
                min_time: 5s
                permit_without_stream: true
              server_parameters:
                time: 5s
                timeout: 10s          
          http:
            endpoint: ${MY_POD_IP}:4318
      zipkin:
        endpoint: ${MY_POD_IP}:9411     
    exporters:
      prometheusremotewrite:
        endpoint: https://mimir-tools/api/v1/push
        retry_on_failure:
          enabled: true
          initial_interval: 1s
          max_interval: 10s
          max_elapsed_time: 30s
      otlp:
        endpoint: otlp.staging.twmlabs.com:4317
        tls:
          insecure: true
        sending_queue:
          enabled: true
          num_consumers: 10
          queue_size: 5000
    service:
      telemetry:
        metrics:
          address: "${MY_POD_IP}:8888"
          level: basic    
        logs:
          level: "warn"  
      extensions:
      - health_check
      - k8s_observer
      pipelines:
        traces:
          receivers:
          - otlp
          - zipkin
          processors:
          - memory_limiter
          - resourcedetection
          - resource
          - k8sattributes
          - tail_sampling
          - batch        
          exporters:
          - otlp
        metrics:
          receivers:
          - prometheus
          - otlp
          processors:
          - memory_limiter        
          - batch
          exporters:
          - prometheusremotewrite
  env:
  - name: K8S_NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName
  - name: MY_POD_IP
    valueFrom:
      fieldRef:
        fieldPath: status.podIP
  - name: CLUSTER_NAME 
    value: "staging-tools"
  mode: statefulset
  podAnnotations:
    sidecar.istio.io/inject: "false"
    prometheus.io/scrape: "true"
    promethios.io/port: "8888"
  priorityClassName: highest-priority
  autoscaler:
    behavior:
      scaleUp:
        stabilizationWindowSeconds: 30
    maxReplicas: 10
    minReplicas: 3
    targetCPUUtilization: 70
    targetMemoryUtilization: 70
  resources:
    limits:
      cpu: 1
      memory: 1Gi
    requests:
      cpu: 1
      memory: 1Gi
  targetAllocator:
    allocationStrategy: consistent-hashing
    enabled: true
    filterStrategy: relabel-config
    observability:
      metrics:
        enableMetrics: false
    prometheusCR:
      enabled: true
      scrapeInterval: 30s
    replicas: 2
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
      requests:
        cpu: 300m
        memory: 300Mi

The operator should have a ServiceMonitor - you could also define an additional ServiceMonitor

NAMESPACE       NAME                                     AGE
monitoring      fastly-exporter                          99m
otel-operator   opentelemetry-operator-metrics-monitor   18d

Apply all resources
Port forward into the targetallocator service and check the /scrape-configs

Expected Result

I should see the scrape configs for the 2 ServiceMonitors deployed

Actual Result

The scrape configs are missing
image

Downgrade version to v1alpha1, simply change the spec version and add a pipe after config: and apply

image

Kubernetes Version

1.30.5

Operator version

0.75.0

Collector version

0.114.1

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") GKE Container OS
Compiler(if manually compiled): (e.g., "go 14.2")

Log output

No errors present in the logs
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","message":"Starting the OpenTelemetry Operator","opentelemetry-operator":"0.114.1","opentelemetry-collector":"otel/opentelemetry-collector-contrib:0.114.0","opentelemetry-targetallocator":"ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:0.114.1","operator-opamp-bridge":"ghcr.io/open-telemetry/opentelemetry-operator/operator-opamp-bridge:0.114.1","auto-instrumentation-java":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.33.5","auto-instrumentation-nodejs":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.53.0","auto-instrumentation-python":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.48b0","auto-instrumentation-dotnet":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:1.2.0","auto-instrumentation-go":"ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.17.0-alpha","auto-instrumentation-apache-httpd":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.4","auto-instrumentation-nginx":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.4","feature-gates":"operator.collector.default.config,-operator.collector.targetallocatorcr,-operator.golang.flags,operator.observability.prometheus,-operator.sidecarcontainers.native,-operator.targetallocator.fallbackstrategy,-operator.targetallocator.mtls","build-date":"2024-12-05T14:36:38Z","go-version":"go1.22.9","go-arch":"amd64","go-os":"linux","labels-filter":[],"annotations-filter":[],"enable-multi-instrumentation":true,"enable-apache-httpd-instrumentation":true,"enable-dotnet-instrumentation":true,"enable-go-instrumentation":false,"enable-python-instrumentation":true,"enable-nginx-instrumentation":false,"enable-nodejs-instrumentation":true,"enable-java-instrumentation":true,"create-openshift-dashboard":false,"zap-message-key":"message","zap-level-key":"level","zap-time-key":"timestamp","zap-level-format":"uppercase"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"setup","message":"the env var WATCH_NAMESPACE isn't set, watching all namespaces"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"setup","message":"Prometheus CRDs are installed, adding to scheme."}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"setup","message":"Openshift CRDs are not installed, skipping adding to scheme."}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"setup","message":"Cert-Manager is not available to the operator, skipping adding to scheme."}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.builder","message":"Registering a mutating webhook","GVK":"opentelemetry.io/v1beta1, Kind=OpenTelemetryCollector","path":"/mutate-opentelemetry-io-v1beta1-opentelemetrycollector"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/mutate-opentelemetry-io-v1beta1-opentelemetrycollector"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.builder","message":"Registering a validating webhook","GVK":"opentelemetry.io/v1beta1, Kind=OpenTelemetryCollector","path":"/validate-opentelemetry-io-v1beta1-opentelemetrycollector"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/validate-opentelemetry-io-v1beta1-opentelemetrycollector"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/convert"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.builder","message":"Conversion webhook enabled","GVK":"opentelemetry.io/v1beta1, Kind=OpenTelemetryCollector"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.builder","message":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.builder","message":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/mutate-v1-pod"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.builder","message":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpAMPBridge","path":"/mutate-opentelemetry-io-v1alpha1-opampbridge"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-opampbridge"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.builder","message":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpAMPBridge","path":"/validate-opentelemetry-io-v1alpha1-opampbridge"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-opampbridge"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"setup","message":"starting manager"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","message":"starting server","name":"health probe","addr":"[::]:8081"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.webhook","message":"Starting webhook server"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.certwatcher","message":"Updated current TLS certificate"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.webhook","message":"Serving webhook server","host":"","port":9443}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.certwatcher","message":"Starting certificate watcher"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.metrics","message":"Starting metrics server"}
{"level":"INFO","timestamp":"2024-12-06T21:19:32Z","logger":"controller-runtime.metrics","message":"Serving metrics server","bindAddress":"0.0.0.0:8080","secure":false}
I1206 21:19:32.842414       1 leaderelection.go:254] attempting to acquire leader lease otel-operator/9f7554c3.opentelemetry.io...
{"level":"INFO","timestamp":"2024-12-06T21:19:45Z","logger":"controllers.OpenTelemetryCollector","message":"pdb field is unset in Spec, creating default"}
I1206 21:20:18.083661       1 leaderelection.go:268] successfully acquired lease otel-operator/9f7554c3.opentelemetry.io
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","logger":"collector-upgrade","message":"looking for managed instances to upgrade"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","logger":"instrumentation-upgrade","message":"looking for managed Instrumentation instances to upgrade"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1beta1.OpenTelemetryCollector"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ConfigMap"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ServiceAccount"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Service"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Deployment"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.DaemonSet"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.StatefulSet"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Ingress"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v2.HorizontalPodAutoscaler"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.PodDisruptionBudget"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ClusterRoleBinding"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ClusterRole"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ServiceMonitor"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.PodMonitor"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting Controller","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1alpha1.OpAMPBridge"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.ConfigMap"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.ServiceAccount"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.Service"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.Deployment"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","message":"Starting Controller","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge"}
{"level":"ERROR","timestamp":"2024-12-06T21:20:18Z","logger":"operator-metrics-sm","message":"error creating Service Monitor for operator metrics","error":"error getting owner references: no deployments found with the specified label","stacktrace":"github.com/open-telemetry/opentelemetry-operator/internal/operator-metrics.OperatorMetrics.Start\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/internal/operator-metrics/metrics.go:75\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:226"}
{"level":"INFO","timestamp":"2024-12-06T21:20:18Z","logger":"instrumentation-upgrade","message":"no instances to upgrade"}
{"level":"INFO","timestamp":"2024-12-06T21:20:19Z","message":"Starting workers","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","worker count":1}
{"level":"INFO","timestamp":"2024-12-06T21:20:19Z","logger":"controllers.OpenTelemetryCollector","message":"pdb field is unset in Spec, creating default"}
{"level":"INFO","timestamp":"2024-12-06T21:20:19Z","message":"Starting workers","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","worker count":1}
{"level":"INFO","timestamp":"2024-12-06T21:20:20Z","logger":"controllers.OpenTelemetryCollector","message":"pdb field is unset in Spec, creating default"}
{"level":"INFO","timestamp":"2024-12-06T21:20:21Z","logger":"collector-upgrade","message":"instance upgraded","name":"otel","namespace":"monitoring","version":"0.114.0"}
{"level":"INFO","timestamp":"2024-12-06T21:20:22Z","logger":"collector-upgrade","message":"instance upgraded","name":"otel-logging","namespace":"monitoring","version":"0.114.0"}
{"level":"INFO","timestamp":"2024-12-06T21:20:22Z","logger":"controllers.OpenTelemetryCollector","message":"pdb field is unset in Spec, creating default"}
{"level":"INFO","timestamp":"2024-12-06T21:20:24Z","logger":"controllers.OpenTelemetryCollector","message":"pdb field is unset in Spec, creating default"}
{"level":"INFO","timestamp":"2024-12-06T21:20:25Z","logger":"controllers.OpenTelemetryCollector","message":"pdb field is unset in Spec, creating default"}
{"level":"INFO","timestamp":"2024-12-06T21:20:28Z","logger":"controllers.OpenTelemetryCollector","message":"pdb field is unset in Spec, creating default"}

Additional context

I've reproduced this behavior in multiple clusters, I understand that even the v1alpha1 spec is ultimately saved as v1beta1. I've also deployed a v1beta1 spec over top of v1alpha1 and it resolves as already configured to the operator and doesn't change. But every time I start with the beta spec the target allocator does not work properly it doesn't get any config beyond the static config defined for the collector for scraping and never sees the servicemonitors or pod monitors.

@jlcrow jlcrow added bug Something isn't working needs triage labels Dec 9, 2024
@jaronoff97
Copy link
Contributor

I believe this is because you are not setting the TA's selectors to empty as we suggest in the upgrade guide here. When we do the conversion in the operator from v1alpha1 to v1beta1 we set it to the empty selector here. We currently do not default to the empty selector and updated our API docs when we made this change to reflect that. If you try this with setting the selectors to be empty rather than nil, do you get the same result?

@jaronoff97 jaronoff97 added question Further information is requested area:target-allocator Issues for target-allocator and removed bug Something isn't working needs triage labels Dec 10, 2024
@jlcrow
Copy link
Author

jlcrow commented Dec 10, 2024

@jaronoff97 Thanks, totally missed that in the docs

@jlcrow jlcrow closed this as completed Dec 10, 2024
@lenalebt
Copy link

The blog post from september does not mention this as well, I am just stumbling upon this. Of course, it's a blog post from the past, but maybe still makes sense to mention these changes there?

https://opentelemetry.io/blog/2024/prom-and-otel/

@jaronoff97
Copy link
Contributor

@lenalebt thanks for sharing! that blog post was written with v1alpha1 which doesn't have this issue (as previously mentioned) so it's still accurate. That being said, it may be worth linking to the upgrade guide for people who read that post and want to use v1beta1.

@lenalebt
Copy link

Maybe I'm doing something wrong, maybe it just is that I'm using newest versions for deployment, but I'm actually having trouble right now with a targetallocator trying to read those CRDs, but I did not have the CRDs installed, nor gave access to them, which meant it failed. I am using that blog post as a reference.

But also, I don't have it fully working yet.

Anyways, the ticket here helped me better understand what the problem is :-).

@jaronoff97
Copy link
Contributor

good to know! I opened up this issue in the community repo, so if you're having trouble following that post it would be great to share your experience and we can improve the docs there too :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:target-allocator Issues for target-allocator question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants