You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Worker logs showing same jobId for different workflows/jobs and it's causing the registration of the runner to fail and the job to become orphaned
#3621
Open
NorseGaud opened this issue
Dec 12, 2024
· 1 comment
We install the latest runner into a macOS VM and it registers it to the org level so that jobs in a repo under the org can use it. This works 99% of the time, but then randomly we get a problem where the actions-runner in the VM doesn't register fully and the job runs indefinitely.
The runner logs show
[2024-12-12 16:30:47Z INFO ProcessInvokerWrapper] Finished process 729 with exit code 102, and elapsed time 00:00:11.4918642.
[2024-12-12 16:30:47Z INFO JobDispatcher] Worker finished for job a4127350-4039-54ee-311c-0227a78ac3cb. Code: 102
[2024-12-12 16:30:47Z INFO JobDispatcher] finish job request for job a4127350-4039-54ee-311c-0227a78ac3cb with result: Failed
[2024-12-12 16:30:47Z INFO Terminal] WRITE LINE: 2024-12-12 16:30:47Z: Job testJob completed with result: Failed
[2024-12-12 16:30:47Z INFO JobDispatcher] Stop renew job request for job a4127350-4039-54ee-311c-0227a78ac3cb.
[2024-12-12 16:30:47Z INFO JobDispatcher] job renew has been cancelled, stop renew job request 7662.
[2024-12-12 16:30:47Z INFO JobNotification] Entering JobCompleted Notification
[2024-12-12 16:30:47Z INFO JobNotification] Entering EndMonitor
[2024-12-12 16:30:47Z INFO MessageListener] Received job status event. JobState: Online
[2024-12-12 16:30:47Z INFO JobDispatcher] Fire signal for one time used runner.
[2024-12-12 16:30:47Z INFO Runner] Job has finished at backend, the runner will exit since it is running under onetime use mode.
[2024-12-12 16:30:47Z INFO Runner] Stop message queue looping.
[2024-12-12 16:30:47Z WARN GitHubActionsService] GET request to https://broker.actions.githubusercontent.com/message?sessionId=90e53455-5052-4e98-8c94-92d824f3f5f0&status=Busy&runnerVersion=2.321.0&os=macOS&architecture=ARM64&disableUpdate=true has been cancelled.
[2024-12-12 16:30:47Z ERR BrokerServer] Catch exception during request
[2024-12-12 16:30:47Z ERR BrokerServer] System.Threading.Tasks.TaskCanceledException: The operation was canceled.
---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled.
---> System.Net.Sockets.SocketException (89): Operation canceled
--- End of inner exception stack trace ---
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token)
at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](CancellationToken cancellationToken, Int32 estimatedSize)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](Memory`1 buffer, CancellationToken cancellationToken)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.AuthenticationHelper.SendWithNtAuthAsync(HttpRequestMessage request, Uri authUri, Boolean async, ICredentials credentials, Boolean isProxyAuth, HttpConnection connection, HttpConnectionPool connectionPool, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.AuthenticationHelper.SendWithAuthAsync(HttpRequestMessage request, Uri authUri, Boolean async, ICredentials credentials, Boolean preAuthenticate, Boolean isProxyAuth, Boolean doRequestAuth, HttpConnectionPool pool, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at GitHub.Services.Common.RawHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at GitHub.Services.Common.VssHttpRetryMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts)
at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
at Sdk.WebApi.WebApi.RawHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
at Sdk.WebApi.WebApi.RawHttpClientBase.SendAsync[T](HttpRequestMessage message, Boolean readErrorBody, Object userState, CancellationToken cancellationToken)
at Sdk.WebApi.WebApi.RawHttpClientBase.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Uri requestUri, HttpContent content, IEnumerable`1 queryParameters, Boolean readErrorBody, Object userState, CancellationToken cancellationToken)
at GitHub.Actions.RunService.WebApi.BrokerHttpClient.GetRunnerMessageAsync(Nullable`1 sessionId, String runnerVersion, Nullable`1 status, String os, String architecture, Nullable`1 disableUpdate, CancellationToken cancellationToken)
at GitHub.Runner.Common.BrokerServer.<>c__DisplayClass7_0.<<GetRunnerMessageAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at GitHub.Runner.Common.RunnerService.RetryRequest[T](Func`1 func, CancellationToken cancellationToken, Int32 maxRetryAttemptsCount, Func`2 shouldRetry)
Describe the bug
We install the latest runner into a macOS VM and it registers it to the org level so that jobs in a repo under the org can use it. This works 99% of the time, but then randomly we get a problem where the actions-runner in the VM doesn't register fully and the job runs indefinitely.
The runner logs show
Per https://github.com/orgs/community/discussions/146348#discussioncomment-11546552, I notice that the actions-runner logs, even for jobs that are unrelated, are all showing the same jobId used:
It looks like the jobId is unique for a while, then starts being reused. You can see
a4127350-4039-54ee-311c-0227a78ac3cb
is reused here.The way I configure the runners inside of the VMs is like the following. I made sure the Token is unique for each, as well as the name, labels, etc.
./config.sh --name anklet-vm-314c9bc4-3a12-479a-8945-fc5750c7574a --token XXXXX --url https://github.com/veertuinc --no-default-labels --labels anka-template:d792c6f6-198c-470f-9526-9c998efe7ab4,anka-template-tag:vanilla+port-forward-22+brew-git,unique-id:12300647430-2 --disableupdate --unattended --ephemeral
Here are the two jobs that are using the same jobId
Here are the _diags for those runners
anklet-vm-cc309b23-4f58-49b4-9406-b429a43b6a84.zip
anklet-vm-314c9bc4-3a12-479a-8945-fc5750c7574a.zip
Really unsure where to go with this one. Why are the jobId reused? Is this something I can control? Or is this a bug in the actions-runner?
To Reproduce
Run a ton of jobs in the same repo with a self-hosted runner that's ephemeral and registered after a queued webhook is created.
Runner Version and Platform
v2.321.0
macOS
The text was updated successfully, but these errors were encountered: