Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove reset attack surface #54

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

hlef
Copy link
Collaborator

@hlef hlef commented Dec 6, 2024

One current limitation of the network stack reset is that a certain set of "reset-critical" variables persists across resets, or are used as part of resets, and will prevent resets from being effective if compromised. This is a relevant attack surface, as compromising these variables will fully DoS the network stack, only repairable through a full reboot. This limitations was tracked as part of #31.

In this PR, I intend to entirely eliminate this attack surface.

I go through the list of reset-critical variables and identify those which are not a problem (by construction they cannot be attacked with our threat model). Thanks to @davidchisnall for discussing this.

This leaves us with two important pieces of data: the socket list, and the threadEntryGuard. I address these through refactoring:

Initially we intended to address this by adding an internal compartment to protect this state (similarly to microreboots), however we realized that this PR's approach is a much better option.

I tested these changes assuming a fully compromised socket list. The network stack reset still works as a charm, albeit visibly more slowly. In practice this should just be an edge case.

hlef added 6 commits December 5, 2024 15:58
If a thread somehow corrupts the socket list, we will lose the ability
to retrieve references to socket locks, ultimately preventing us from
unblocking threads blocked on them.

To handle that situation, ensure that threads block on the socket lock
in steps, checking the network stack reset state at every step through
`LockGuard` "conditions". If a network stack reset is detected in this
way, threads will bail out.

Note that the "condition" lambda is necessary here, since socket locks
are allocated on a caller capability which we cannot heap-free-all. (see
recent additions to the `LockGuard` class).

Signed-off-by: Hugo Lefeuvre <[email protected]>
This commit addresses a number of issues in the reset when socket lists
are corrupted.

Together with the recent support of steps/conditions when waiting on
socket locks and event queues, this removes the socket list from the set
of reset-critical variables.

Signed-off-by: Hugo Lefeuvre <[email protected]>
Although `currentSocketEpoch` and `userThreadCount` are both reset
critical, they should be impossible to corrupt by construction, unless
control-flow or spatial memory safety is compromised. Document that.

Signed-off-by: Hugo Lefeuvre <[email protected]>
By moving the reset of `threadEntryGuard` at a different place in the
execution flow, we can remove it from the set of reset variables.

The idea here is to only reset `threadEntryGuard` in the case of a
crash, and only if the crash was triggered by a user thread, since
resetting is not needed in the case of a network thread crash (due to
deterministic execution flow, see comment in the code).

Signed-off-by: Hugo Lefeuvre <[email protected]>
This is not used anymore, we stopped using it with the
stack-overflow-resilient handler.

Signed-off-by: Hugo Lefeuvre <[email protected]>
Currently the example ignores the failure and tries to re-open the
listening socket, running into an infinite loop. This is not meaningful,
if we cannot close the listening socket we better stop since we will
never be able to re-bind onto the server port anymore.

Signed-off-by: Hugo Lefeuvre <[email protected]>
@hlef
Copy link
Collaborator Author

hlef commented Dec 6, 2024

(The CI fails because the RTOS core PR has not yet been merged).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant