Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snuba-subscription-consumer-* containers are failing continuously #6137

Open
sree-warrier opened this issue Jul 20, 2024 · 1 comment
Open
Assignees

Comments

@sree-warrier
Copy link

sree-warrier commented Jul 20, 2024

Self-Hosted Version

23.11.2

CPU Architecture

x86_64

Docker Version

NA

Docker Compose Version

NA

Steps to Reproduce

Seeing following containers been crashing continuously. Is this services used for alerting ? Have little confusions now on the services functionality.

snuba-subscription-consumer-events
snuba-subscription-consumer-metrics
snuba-subscription-consumer-transactions

Logs:

2024-07-20 15:57:07,088 Initializing Snuba...
2024-07-20 15:57:10,884 Snuba initialization took 3.7952772620010364s
{"module": "builtins", "event": "Checking Clickhouse connections", "severity": "info", "timestamp": "2024-07-20T15:57:10.897290Z"}
2024-07-20 15:57:10,966 New partitions assigned: {Partition(topic=Topic(name='snuba-commit-log'), index=0): 0, Partition(topic=Topic(name='snuba-commit-log'), index=1): 0, Partition(topic=Topic(name='snuba-commit-log'), index=2): 0, Partition(topic=Topic(name='snuba-commit-log'), index=3): 0, Partition(topic=Topic(name='snuba-commit-log'), index=4): 0}
2024-07-20 15:57:10,979 Caught exception, shutting down...
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 294, in run
    self._run_once()
  File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 382, in _run_once
    self.__processing_strategy.submit(message)
  File "/usr/src/snuba/snuba/subscriptions/scheduler_processing_strategy.py", line 240, in submit
    self.__next_step.submit(message)
  File "/usr/src/snuba/snuba/subscriptions/combined_scheduler_executor.py", line 275, in submit
    tasks.extend([task for task in entity_scheduler[tick.partition].find(tick)])
KeyError: 2
2024-07-20 15:57:10,981 Closing <snuba.subscriptions.scheduler_consumer.CommitLogTickConsumer object at 0x7cba41de5f70>...
2024-07-20 15:57:10,983 Partitions to revoke: [Partition(topic=Topic(name='snuba-commit-log'), index=0), Partition(topic=Topic(name='snuba-commit-log'), index=1), Partition(topic=Topic(name='snuba-commit-log'), index=2), Partition(topic=Topic(name='snuba-commit-log'), index=3), Partition(topic=Topic(name='snuba-commit-log'), index=4)]
2024-07-20 15:57:10,983 Partition revocation complete.
2024-07-20 15:57:10,987 Processor terminated
Traceback (most recent call last):
  File "/usr/local/bin/snuba", line 33, in <module>
    sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/src/snuba/snuba/cli/subscriptions_scheduler_executor.py", line 153, in subscriptions_scheduler_executor
    processor.run()
  File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 294, in run
    self._run_once()
  File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 382, in _run_once
    self.__processing_strategy.submit(message)
  File "/usr/src/snuba/snuba/subscriptions/scheduler_processing_strategy.py", line 240, in submit
    self.__next_step.submit(message)
  File "/usr/src/snuba/snuba/subscriptions/combined_scheduler_executor.py", line 275, in submit
    tasks.extend([task for task in entity_scheduler[tick.partition].find(tick)])
KeyError: 2

Alerting system were working fine. We made few changes with kafka partitions after that we saw only these 3 containers were down.

  • Initially increased kafka partition for ingest-events and events from 1 to 5 for scale testing
  • We saw only these 3 services were getting down with above error
  • Followed the steps updated in this issue 'Number of Errors' alert rules not triggering self-hosted#2067
  • Tried to clear all lags and offset, still didnt worked out
  • Recreated the topics as per solution mentioned in above issue, it didnt worked out.
  • We deleted all existing alerts and recreated, now alerts are working. But still the containers are in failed state.

Have little confusions now on these services functionality. Which service is now serving the alerting ?

Suspecting some issue with partition mis-match(please do correct us if this is not related to it), so have increased all the topics partition to 5. Currently review all topic configs, seeing these 3 topics snuba-commit-log, events-subscription-results and ingest-monitors having a ReplicationFactor of 3 rest all topic is having ReplicationFactor as 1, remaining all configs remains same now.

Also while listing out consumer-groups seeing following having no active members

Consumer group 'snuba-transactions-subscriptions-consumers' has no active members.
Consumer group 'snuba-events-subscriptions-consumers' has no active members.
Consumer group 'sentry-commit-log-6e1d91f6451a11ef8ad962551908ad8e' has no active members.
Consumer group 'nuba-metrics-subscriptions-consumers' has no active members.
Consumer group 'sentry-commit-log-12e82a30451a11efb933c2a760684d4c' has no active members.

Do let us know if any other information needed.

Expected Result

NA

Actual Result

NA

Event ID

No response

@IanWoodard IanWoodard transferred this issue from getsentry/self-hosted Jul 23, 2024
@mcannizz mcannizz self-assigned this Sep 13, 2024
@untitaker
Copy link
Member

I think this may be a duplicate of #5855 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Status: No status
Development

No branches or pull requests

3 participants