Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during VisibilityDeleteExecution #6995

Open
steveetm opened this issue Dec 15, 2024 · 1 comment
Open

Error during VisibilityDeleteExecution #6995

steveetm opened this issue Dec 15, 2024 · 1 comment

Comments

@steveetm
Copy link

We getting extreme amount of logs from temporal server:

{"level":"error","ts":"2024-12-15T16:21:36.296Z","msg":"Operation failed with an error.","error":"context deadline exceeded","logging-call-at":"visiblity_manager_metrics.go:264","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:156\ngo.temporal.io/server/common/persistence/visibility.(*visibilityManagerMetrics).updateErrorMetric\n\t/home/builder/temporal/common/persistence/visibility/visiblity_manager_metrics.go:264\ngo.temporal.io/server/common/persistence/visibility.(*visibilityManagerMetrics).DeleteWorkflowExecution\n\t/home/builder/temporal/common/persistence/visibility/visiblity_manager_metrics.go:128\ngo.temporal.io/server/service/history.(*visibilityQueueTaskExecutor).processDeleteExecution\n\t/home/builder/temporal/service/history/visibility_queue_task_executor.go:494\ngo.temporal.io/server/service/history.(*visibilityQueueTaskExecutor).Execute\n\t/home/builder/temporal/service/history/visibility_queue_task_executor.go:122\ngo.temporal.io/server/service/history/queues.(*executableImpl).Execute\n\t/home/builder/temporal/service/history/queues/executable.go:236\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:223\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/builder/temporal/common/backoff/retry.go:119\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:145\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/builder/temporal/common/backoff/retry.go:120\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:233\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:211"}
{"level":"error","ts":"2024-12-15T16:21:36.304Z","msg":"Fail to process task","shard-id":1,"address":"127.0.0.1:7234","component":"visibility-queue-processor","wf-namespace-id":"064f58ee-d88c-4c7c-8b81-77b93c315829","wf-id":"*","wf-run-id":"f4dd4001-fdbd-44d7-aaf1-9c401226e546","queue-task-id":23085605,"queue-task-visibility-timestamp":"2024-12-14T13:07:44.404Z","queue-task-type":"VisibilityDeleteExecution","queue-task":{"NamespaceID":"064f58ee-d88c-4c7c-8b81-77b93c315829","WorkflowID":"*","RunID":"f4dd4001-fdbd-44d7-aaf1-9c401226e546","VisibilityTimestamp":"2024-12-14T13:07:44.404345212Z","TaskID":23085605,"Version":0,"CloseExecutionVisibilityTaskID":9663191,"StartTime":null,"CloseTime":null},"wf-history-event-id":0,"error":"context deadline exceeded","lifecycle":"ProcessingFailed","logging-call-at":"lazy_logger.go:68","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:156\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/home/builder/temporal/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/home/builder/temporal/service/history/queues/executable.go:347\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:224\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/builder/temporal/common/backoff/retry.go:119\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:145\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/builder/temporal/common/backoff/retry.go:120\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:233\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:211"}

Any idea how to investigate and/or recover from this?

Expected Behavior

Not getting errors, visibility correctly updated

Actual Behavior

Getting an extrem amount of errors, we can see past events listed in temporal-ui, way after retention period.
Workflows seems to be running, finishing, we can see them in temporal-ui.

Steps to Reproduce the Problem

Not sure. We did nothing special, it was working fine. We changed mysql password, temporal-service run into some access denied error, service restarted and these logs flooding since then.

Specifications

  • Version: 1.22.4
@steveetm
Copy link
Author

After upgrading to the latest version the issue is not fixed, but got a new error:

{"level":"error","ts":"2024-12-16T20:48:10.526Z","msg":"Operation failed with an error.","error":"unable to delete custom search attributes: context deadline exceeded","logging-call-at":"/home/runner/work/docker-builds/docker-builds/temporal/common/persistence/visibility/visiblity_manager_metrics.go:195","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/log/zap_logger.go:155\ngo.temporal.io/server/common/persistence/visibility.(*visibilityManagerMetrics).updateErrorMetric\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/persistence/visibility/visiblity_manager_metrics.go:195\ngo.temporal.io/server/common/persistence/visibility.(*visibilityManagerMetrics).DeleteWorkflowExecution\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/persistence/visibility/visiblity_manager_metrics.go:129\ngo.temporal.io/server/service/history.(*visibilityQueueTaskExecutor).processDeleteExecution\n\t/home/runner/work/docke^Coff/retry.go:64\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/tasks/fifo_scheduler.go:233\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/tasks/fifo_scheduler.go:211"}

The number of logs emitted is considerably lower, but there are 170k rows in visibility tasks and 64k in executions_visibility (the retention period is one day, this is way more than we should have)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant