Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self hosted runner occasionally crashes on "SessionConflictException" after multiple "A session for this runner already exists" #3624

Open
liriID opened this issue Dec 15, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@liriID
Copy link

liriID commented Dec 15, 2024

Describe the bug
We are using self hosted runners, deployed on EKS and we recently started experiencing repeating crashes of our runners. This doesn't seem to be related to any change besides having the previously long running containers restarted.

Our Dockerfile:

FROM debian:bookworm

ARG RUNNER_VERSION

ENV GITHUB_PERSONAL_TOKEN=""

RUN apt-get update && // ... install lots of background dependencies and packages

USER github
WORKDIR /home/github

RUN curl -O -L https://github.com/actions/runner/releases/download/v$RUNNER_VERSION/actions-runner-linux-x64-$RUNNER_VERSION.tar.gz && \
    tar xzf ./actions-runner-linux-x64-$RUNNER_VERSION.tar.gz && \
    sudo ./bin/installdependencies.sh

COPY --chown=github:github entrypoint.sh ./entrypoint.sh
RUN sudo chmod u+x ./entrypoint.sh

ENTRYPOINT ["/home/github/entrypoint.sh"]

With entrypoint.sh being:

#!/bin/sh

registration_url="https://api.github.com/orgs/acme/actions/runners/registration-token"
echo "Requesting registration URL at '${registration_url}'"

payload=$(curl --http1.1 -sX POST -H "Authorization: token ${GITHUB_PERSONAL_TOKEN}" "$registration_url")
export RUNNER_TOKEN=$(echo $payload | jq .token --raw-output)

./config.sh \
    --name $(hostname) \
    --token ${RUNNER_TOKEN} \
    --url https://github.com/acme \
    --unattended \
    --replace

remove() {
    ./config.sh remove --unattended --token "${RUNNER_TOKEN}"
}

trap 'remove; exit 130' INT
trap 'remove; exit 143' TERM

./run.sh "$*" &

wait $!

To Reproduce
After runner for some amount of time, the runner container will crash and restart. Logs typically look similar to the following:

Requesting registration URL at 'https://api.github.com/orgs/acme/actions/runners/registration-token'

--------------------------------------------------------------------------------
|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |
--------------------------------------------------------------------------------

# Authentication


√ Connected to GitHub

# Runner Registration




A runner exists with the same name
√ Successfully replaced the runner
√ Runner connection is good

# Runner settings


√ Settings Saved.


√ Connected to GitHub

A session for this runner already exists.
2024-12-14 22:04:06Z: Runner connect error: The actions runner runner-5969d445d5-7vx8v already has an active session.. Retrying until reconnected.

√ Connected to GitHub

A session for this runner already exists.

√ Connected to GitHub

A session for this runner already exists.

√ Connected to GitHub

A session for this runner already exists.

√ Connected to GitHub

A session for this runner already exists.

√ Connected to GitHub

A session for this runner already exists.

√ Connected to GitHub

A session for this runner already exists.

√ Connected to GitHub

A session for this runner already exists.
Stop retry on SessionConflictException after retried for 240 seconds.
Failed to create session. The actions runner runner-5969d445d5-7vx8v already has an active session.
Runner listener exit with Session Conflict error, stop the service, no retry needed.
Exiting runner...

Expected behavior
Runner should not crashloop.

Runner Version and Platform

Runner version 2.320.0, running on debian:bookworm based Docker, on top of EKS version v1.30.6-eks-7f9249a.

Runner and Worker's Diagnostic Logs

We don't have diagnostic logs of crashed runners because this info is not persisted. If relevant, we could persist it for debugging.
As far as I can see in running containers, nothing besides INFO logs appear.

@liriID liriID added the bug Something isn't working label Dec 15, 2024
@ChristopherHX
Copy link
Contributor

Some unofficial hints to improve reliability of your container.

# don't use a runner name that persists,
# e.g. don't use hostname alone append a uuid
# A crashed runner doesn't exit the session
# The session is blocked until the Actions Service cleans up
# This takes more than 5 minutes, instant container restart with same runner name might reset the timer!
# I assume `-replace` doesn't kill a stale session of a given runner name
./config.sh \
    --name "$(hostname)-$(uuidgen)" \
    --token ${RUNNER_TOKEN} \
    --url https://github.com/acme \
    --unattended \
    --replace
remove() {
    # you need to regenerate RUNNER_TOKEN if it is ca. 1 hour old!!! or you get peermission denied
    # at least the configure command has a `--pat` option to let the runner make the rest api call, but yes caching this token saves rate limited api calls if reused to remove it in less that 1h.
    ./config.sh remove --unattended --token "${RUNNER_TOKEN}"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants