You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Retina sometimes fails to remove retinaendpoints.retina.sh objects leading to errors: ‘etcdserver: mvcc: database space exceeded’ and stop cluster operation.
To Reproduce
It is difficult to pinpoint clear steps to get the problem, as the problem (at least since the last update) occurs periodically. Most obsolescence occurs in namespaces where tasks are started using spark-operator. Many of the pods in this namespace end up with the status: Error, ContainerStatusUnknown or OOMKilled.
Last time I deleted all retinaendpoints.retina.sh objects (2 weeks ago it was 10 times more than pods), all was well for a while. Now I see that the problem must have occurred again, below is a etcd database summary:
As you can see, the number of retina.sh objects is much higher than the number of pods or cilium.io objects, which in my opinion is an incorrect condition.
Expected behavior
The number of retina.sh objects in the etcd database should not significantly exceed the number of pods objects
Platform (please complete the following information):
OS: Alma Linux 8
Kubernetes Version: 1.30.6
Host: self-host
Retina Version: v0.0.19
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Describe the bug
Retina sometimes fails to remove retinaendpoints.retina.sh objects leading to errors: ‘etcdserver: mvcc: database space exceeded’ and stop cluster operation.
To Reproduce
It is difficult to pinpoint clear steps to get the problem, as the problem (at least since the last update) occurs periodically. Most obsolescence occurs in namespaces where tasks are started using spark-operator. Many of the pods in this namespace end up with the status: Error, ContainerStatusUnknown or OOMKilled.
Last time I deleted all retinaendpoints.retina.sh objects (2 weeks ago it was 10 times more than pods), all was well for a while. Now I see that the problem must have occurred again, below is a etcd database summary:
As you can see, the number of retina.sh objects is much higher than the number of pods or cilium.io objects, which in my opinion is an incorrect condition.
Expected behavior
The number of retina.sh objects in the etcd database should not significantly exceed the number of pods objects
Platform (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: