You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A user in my org contacted me with a job that never ran.
I found an error message like this ...
2024/05/23 21:16:34 failed to process job: failed to check to register runner (target ID: 10ed1fec-041c-4829-ab1c-b7de7ff9e673, job ID: 6bbe1e7e-9b8c-49c7-bbc3-623eee4ca54c): failed to check existing runner in GitHub: failed to get list of runners: failed to list runners: failed to list organization runners: GET https://REDACTED/actions/runners?per_page=100: 503 OrgTenant service unavailable []
I tracked that error down to call to list runners for the org.
in this particular case the trace is starting at starter.go in function ProcessJob where "Strict" config is true on a call to "checkRegisteredRunner".
The result of this 503 is deleteInstance is called in ProcessJob.
The overall impact of that error is that the runner is deleted. This lead to the job not getting worked on.
I contacted GitHub Enterprise support and they responded with the following suggestion...
Encountering a 503 error may occur when the server is temporarily overwhelmed and requires a moment to stabilize. This situation could be attributed to high traffic, maintenance activities, or a brief interruption.
In your specific case, the appearance of the error message "OrgTenant service unavailable" indicates a temporary disruption with the service responsible for managing organization actions/runners.
When confronted with a 503 error, it is advisable to establish a retry mechanism. It is important not to attempt immediate retries but rather consider implementing an exponential backoff strategy. This approach involves increasing the wait time between each retry to allow the server sufficient time to recover and mitigate potential complications.
I'll add a comment with how I mitigated w/code change.
The text was updated successfully, but these errors were encountered:
The RetryFunction establishes a back off timer ...
func GetBackoffTimer(ctx context.Context, maxRetry uint64) backoff.BackOff {
off := backoff.NewExponentialBackOff()
off.InitialInterval = 1 * time.Second
off.Multiplier = 2
off.MaxElapsedTime = 10 * time.Second
off.NextBackOff() // burn one, no matter what I do I can't get the initial to be one second!?
b := backoff.WithMaxRetries(backoff.WithContext(off, ctx), maxRetry)
return b
}
A user in my org contacted me with a job that never ran.
I found an error message like this ...
I tracked that error down to call to list runners for the org.
myshoes/pkg/gh/runner.go
Line 48 in cbe7eda
in this particular case the trace is starting at starter.go in function ProcessJob where "Strict" config is true on a call to "checkRegisteredRunner".
The result of this 503 is deleteInstance is called in ProcessJob.
The overall impact of that error is that the runner is deleted. This lead to the job not getting worked on.
I contacted GitHub Enterprise support and they responded with the following suggestion...
I'll add a comment with how I mitigated w/code change.
The text was updated successfully, but these errors were encountered: