Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some tests terminate in an unusual manner when canceled #1722

Open
EliahKagan opened this issue Dec 13, 2024 Discussed in #1414 · 9 comments
Open

Some tests terminate in an unusual manner when canceled #1722

EliahKagan opened this issue Dec 13, 2024 Discussed in #1414 · 9 comments

Comments

@EliahKagan
Copy link
Member

EliahKagan commented Dec 13, 2024

Discussed in #1414

Note: I've converted the discussion to this issue, as suggested by @Byron, after further investigation suggests it may not be related to cargo nextest or the use of serial_test::serial and thus perhaps instead to the behavior of gitoxide or non-test-specific dependencies. —Eliah

Originally posted by EliahKagan June 22, 2024

As Raymond Chen has explained in STATUS_STACK_BUFFER_OVERRUN doesn’t mean that there was a stack buffer overrun, this condition is misleading and often not nearly as serious as it may seem, because this way of terminating a process on Windows often does not designate any stack overflow, buffer overrun, or any kind of memory-related bug or error. Instead, it is widely used to terminate a program in any kind of critical immediate failure.

Nonetheless, I find it odd that I see this when I cancel gitoxide's test suite with Ctrl+C on Windows, since it does not happen when I cancel other Rust projects' tests when running them in the same way.

ek@Glub MINGW64 ~/source/repos/gitoxide (main)
$ cargo nextest run --all --no-fail-fast
warning: function `evaluate_target_dir` is never used
 --> gix-prompt\tests\prompt.rs:9:8
  |
9 |     fn evaluate_target_dir() -> String {
  |        ^^^^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(dead_code)]` on by default

warning: `gix-prompt` (test "prompt") generated 1 warning
   Compiling gix-filter v0.11.2 (C:\Users\ek\source\repos\gitoxide\gix-filter)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 4.67s
    Starting 2355 tests across 127 binaries (9 skipped; run ID: 8406b876-001a-41fa-a054-487ee2cf1f3f, nextest profile: default)
        PASS [   0.095s] gitoxide plumbing::main::tests::clap
        PASS [   0.090s] gitoxide shared::value_parser_tests::rename_fraction
        PASS [   0.085s] gix id::tests::size_of_oid
        PASS [   0.080s] gix open::tests::size_of_options
        PASS [   0.037s] gix remote::connection::fetch::refs::tests::update::unborn_remote_branches_can_be_created_locally_if_they_are_new
        PASS [   0.043s] gix remote::connection::fetch::refs::tests::update::remote_symbolic_refs_with_locally_unavailable_target_result_in_valid_peeled_branches
        PASS [   0.050s] gix remote::connection::fetch::refs::tests::update::remote_symbolic_refs_with_locally_unavailable_target_dont_overwrite_valid_local_branches
        PASS [   0.053s] gix remote::connection::fetch::refs::tests::update::unborn_remote_branches_can_update_local_unborn_branches
        PASS [   0.047s] gix remote::connection::fetch::refs::tests::update::unborn_remote_refs_dont_overwrite_valid_local_refs
        PASS [   0.281s] gix remote::connection::fetch::refs::tests::update::checked_out_branches_in_worktrees_are_rejected_with_additional_information
        PASS [   0.409s] gix::gix clone::blocking_io::fetch_and_checkout_specific_non_existing
        PASS [   0.469s] gix::gix clone::blocking_io::fetch_and_checkout_empty_remote_repo
        PASS [   0.809s] gix::gix clone::blocking_io::fetch_only_without_configuration
        PASS [   1.000s] gix::gix clone::blocking_io::fetch_and_checkout_specific_annotated_tag
        PASS [   1.047s] gix::gix clone::blocking_io::fetch_only_with_configuration
        PASS [   1.107s] gix::gix clone::blocking_io::fetch_and_checkout_specific_ref
        PASS [   1.168s] gix::gix clone::blocking_io::fetch_and_checkout
        PASS [   0.053s] gix::gix clone::clone_and_destination_must_be_empty
        PASS [   0.130s] gix::gix clone::clone_and_early_persist_without_receive
        PASS [   0.128s] gix::gix clone::clone_bare_into_empty_directory_and_early_drop
        PASS [   0.248s] gix::gix clone::blocking_io::from_shallow_prohibited_with_option
        PASS [   0.082s] gix::gix clone::clone_into_empty_directory_and_early_drop
        PASS [   0.069s] gix::gix commit::describe::lightweight_tags_are_sorted_lexicographically
        PASS [   0.068s] gix::gix commit::describe::tags_are_sorted_by_date_and_lexicographically
        PASS [   1.002s] gix::gix clone::blocking_io::fetch_shallow_no_checkout_then_unshallow
        PASS [   0.067s] gix::gix commit::describe::tags_are_sorted_by_priority
        PASS [   0.054s] gix::gix commit::describe::with_dirty_suffix::dirty_suffix_does_not_apply_if_not_dirty
        PASS [   0.068s] gix::gix commit::describe::with_dirty_suffix::dirty_suffix_applies_automatically_if_dirty
        PASS [   0.040s] gix::gix config::tree::branch::merge
        PASS [   0.998s] gix::gix clone::blocking_io::from_non_shallow_by_deepen_exclude_then_deepen_to_unshallow
        PASS [   0.041s] gix::gix config::tree::checkout::workers
        PASS [   0.048s] gix::gix config::tree::core::abbrev
        PASS [   0.039s] gix::gix config::tree::core::autocrlf
        PASS [   0.029s] gix::gix config::tree::core::check_round_trip_encoding
        PASS [   0.042s] gix::gix config::tree::core::check_stat
        PASS [   0.034s] gix::gix config::tree::core::delta_base_cache_limit
   Canceling due to interrupt: 15 tests still running
        PASS [   0.041s] gix::gix config::tree::core::disambiguate
        PASS [   0.033s] gix::gix config::tree::core::log_all_ref_updates
        PASS [   0.537s] gix::gix clone::blocking_io::from_shallow_allowed_by_default
        PASS [   0.040s] gix::gix config::tree::core::eol
        PASS [   0.030s] gix::gix config::tree::core::safecrlf
        PASS [   0.029s] gix::gix config::tree::diff::driver_binary
        PASS [   0.036s] gix::gix config::tree::diff::algorithm
        PASS [   0.051s] gix::gix config::tree::core::timeouts
        PASS [   0.030s] gix::gix config::tree::diff::renames
        PASS [   0.045s] gix::gix config::tree::extensions::object_format
        PASS [   0.037s] gix::gix config::tree::fetch::algorithm
        PASS [   0.031s] gix::gix config::tree::gitoxide::allow::protocol_from_user
        PASS [   0.039s] gix::gix config::tree::fetch::recurse_submodule
        PASS [   0.049s] gix::gix config::tree::gitoxide::author::name_and_email_fallback
        PASS [   0.042s] gix::gix config::tree::gitoxide::commit::author_and_committer_date
        PASS [   0.045s] gix::gix config::tree::gitoxide::committer::name_and_email_fallback
        PASS [   0.036s] gix::gix config::tree::gitoxide::http::connect_timeout
        PASS [   0.028s] gix::gix config::tree::http::extra_header
        PASS [   0.027s] gix::gix config::tree::http::follow_redirects
        PASS [   0.029s] gix::gix config::tree::http::http_version
       ABORT [   1.810s] gix remote::connection::fetch::refs::tests::update::remote_symbolic_refs_can_always_be_set_as_there_is_no_scenario_where_it_could_be_nonexisting_and_rejected
     Message [         ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
       ABORT [   1.765s] gix remote::connection::fetch::refs::tests::update::various_valid_updates
     Message [         ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
       ABORT [   0.801s] gix::gix clone::blocking_io::from_non_shallow_then_deepen_then_deepen_since_to_unshallow
     Message [         ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
       ABORT [   1.901s] gix remote::connection::fetch::refs::tests::update::remote_symbolic_refs_can_be_written_locally_and_point_to_tracking_branch
     Message [         ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
       ABORT [   2.001s] gix remote::connection::fetch::refs::tests::update::non_fast_forward_is_rejected_if_dry_run_is_disabled
     Message [         ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
       ABORT [   2.006s] gix remote::connection::fetch::refs::tests::update::remote_refs_cannot_map_to_local_head
     Message [         ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
       ABORT [   2.067s] gix remote::connection::fetch::refs::tests::update::fast_forwards_are_called_out_even_if_force_is_given
     Message [         ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
       ABORT [   2.062s] gix remote::connection::fetch::refs::tests::update::local_symbolic_refs_can_be_overwritten
     Message [         ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
       ABORT [   2.118s] gix remote::connection::fetch::refs::tests::update::local_direct_refs_are_written_with_symbolic_ones
     Message [         ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
       ABORT [   2.215s] gix remote::connection::fetch::refs::tests::update::non_fast_forward_is_rejected_but_appears_to_be_fast_forward_in_dryrun_mode
     Message [         ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
------------
     Summary [   2.264s] 66/2355 tests run: 56 passed, 10 failed, 0 skipped
error: test run failed

As noted in this Stack Overflow answer and that Microsoft documentation, 0xc0000409 designates a (user mode) STATUS_STACK_BUFFER_OVERRUN.

When this is produced by a call to the __fastfail intrinsic, there is a separate code that provides additional information. I don't know if that's relevant here or how to extract the information. The other error information shown in that test output is os error 1282 which seems to just mean "invalid operation."

For contrast, I have tried running cargo nextest --all --no-fail-fast and pressing Ctrl+C on that same Windows system for three other projects, all of which showed a more intuitive termination condition and none of which showed 0xc0000409.

ripgrep:

       ABORT [   0.047s] globset glob::tests::matchcasei1
     Message [         ] code 0xc000013a: {Application Exit by CTRL+C}
The application terminated as a result of a CTRL+C. (os error 572)

rust-bio:

       ABORT [   0.166s] bio data_structures::interval_tree::array_backed_interval_tree::tests::find_arbitrary
     Message [         ] code 0xc000013a: {Application Exit by CTRL+C}
The application terminated as a result of a CTRL+C. (os error 572)

(With various other interrupted tests, with the same message.)

crossbeam:

       ABORT [   1.046s] crossbeam-channel::after fairness
     Message [         ] code 0xc000013a: {Application Exit by CTRL+C}
The application terminated as a result of a CTRL+C. (os error 572)

(With many other interrupted tests, with the same message.)

"Application Exit by CTRL+C" is an intuitive effect of pressing Ctrl+C to cancel tests, while a request for immediate termination--even if not related to any actual memory errors--is less intuitive.

Full output from all four projects tested can be seen in this gist.

I do not assume this is a bug. But I don't want to assume it isn't, unless this can be explained.

The nextest documentation notes that it uses Windows job objects to terminate process trees "immediately," which I would be taken to think might explain this behavior if it always or usually happened, but does not explain it given that it seems only to happen with gitoxide.

The effect does not appear specific to when tests from any particular crate of gitoxide are running.

Although I believe no signal--in the usual POSIX-related sense--is being used here to terminate the processes. Nonetheless, maybe this has something to do with how gitoxide deals with signals?

I cannot produce this with cargo test commands but I don't know if that means it's related to nextest or instead if just cargo test does not show enough information about termination to reveal it. Likewise, I do not know if this can happen under any non-test circumstances when running applications that use gitoxide crates.

@sunshowers
Copy link

Hey, nextest author here.

The nextest documentation notes that it uses Windows job objects to terminate process trees "immediately," which I would be taken to think might explain this behavior if it always or usually happened, but does not explain it given that it seems only to happen with gitoxide.

In case of a Ctrl-C, Windows sends the signal to all processes running on the console that don't have a disabled handler. See the note next to CTRL_C_EVENT in GenerateConsoleCtrlEvent.

On Unix, nextest puts each process in a process group, and is responsible for forwarding signals to test processes. On Windows, nextest does not put processes in Windows's own notion of process group -- not out of principle, but just because I'd like someone with Windows expertise to enumerate and outline all the issues to look for. Test processes can do many weird things, so you have to be quite careful while making changes.

See nextest's signal handling docs for more info. These details have been the same for a long time, and haven't changed for 0.9.86.


When nextest receives a Ctrl-C, all of the other tests are likely to have received Ctrl-C at the same time as well, and are likely to have immediately exited. Nextest does try to call TerminateJobObject on all tests, but this is generally a no-op. It is possible, though, that with gitoxide the tests don't exit immediately on Ctrl-C -- but rather have a Ctrl-C handler, and are terminated by TerminateJobObject instead. That might explain the strange errors.

@sunshowers
Copy link

sunshowers commented Dec 14, 2024

And it suggests a possible improvement in nextest -- maybe on a ctrl-c, nextest should wait for processes to exit by themselves (for the grace-period setting, default 10s) before calling TerminateJobObject on them.

@Byron
Copy link
Member

Byron commented Dec 14, 2024

Thanks for collecting everything we figured out thus far, @EliahKagan , and great to see @sunshowers chiming in, too!

Although I believe no signal--in the usual POSIX-related sense--is being used here to terminate the processes. Nonetheless, maybe this has something to do with how gitoxide deals with signals?

If that would happen, then it's a bug. For all I know, signal handlers are never installed automatically, nor are they dealt with in any way in plumbing crates. gix (CLI) will call install the provided gix-tempfile signal handlers to assure tempfiles are cleaned up similarly to how Git does it, but that's not under test here.

What's interesting is the gix::gix clone::blocking_io::fetch* family of tests which are a little bit special. The cloning happens from a local path, which means a git process will be spawned which acts as a server speaking the git protocol. When a signal arrives, I assume it would reach the spawned process, and maybe it will respond in a special way or has handlers installed. And it's very likely that Git will have handlers installed as that is necessary for the tempfile/lockfile machinery.

While all that is happening, gix will definitely be reading from a pipe, maybe that's also special while a signal arrives.

@sunshowers
Copy link

The cloning happens from a local path, which means a git process will be spawned which acts as a server speaking the git protocol. When a signal arrives, I assume it would reach the spawned process, and maybe it will respond in a special way or has handlers installed.

This sounds right absent something special. Ctrl-C would go to all processes in the console.

And it's very likely that Git will have handlers installed as that is necessary for the tempfile/lockfile machinery.

So what might be happening, assuming the remote git installs a Ctrl-C handler is:

  • Nextest and Git both receive the Ctrl-C.
  • The remote git starts running its Ctrl-C handler.
  • But meanwhile, nextest immediately calls TerminateJobObject on the test and all subprocesses, which means the git process gets killed before the signal handler has a chance to complete.

@sunshowers
Copy link

sunshowers commented Dec 14, 2024

I found this code in git: https://github.com/git/git/blob/2ccc89b0c16c51561da90d21cfbb4b58cc877bf6/compat/mingw.c#L681

Unfortunately, Git is not really a native Windows application, so internally Ctrl-C gets translated via MinGW into SIGINT. This translation layer might be acting up as well.

What I'm going to do is -- I'm going to put up a PR to change nextest, such that it waits for the grace period in case of Ctrl-C. Would love it if you gave that PR a shot! See comment below.

@sunshowers
Copy link

(Like most Rust devtools, nextest is a full-fledged Windows application. There is no attempt to emulate Unix on Windows -- many of the same types are used, but with #[cfg(unix)] and #[cfg(windows)] enum variants as appropriate.)

@sunshowers
Copy link

sunshowers commented Dec 14, 2024

Oh no, I was wrong. In reality, on receiving Ctrl-C, nextest doesn't do anything at all. It just lets tests exit by themselves. See:

https://github.com/nextest-rs/nextest/blob/208d6e527098cf347f731975f3606249ee1cc791/nextest-runner/src/runner/windows.rs#L86-L90

I actually think this might be wrong -- nextest should probably wait for the grace period and then call TerminateJobObject. But that wouldn't impact this issue.

My guess right now is that this is a git/mingw issue. But in that case, I'm not quite sure why that code would be reflected in the exit status of the test process.

It might be worth installing a Ctrl-C handler in these tests, that just waits a few seconds before terminating or something.

edit: Ctrl-C grace periods are implemented in nextest-rs/nextest@66f4a99, which is part of nextest main. It wouldn't have any impact on this issue.

@Byron
Copy link
Member

Byron commented Dec 15, 2024

It might be worth installing a Ctrl-C handler in these tests, that just waits a few seconds before terminating or something.

Even though I don't know why exactly it happens, I don't think a systemic issue should be patched up on the side of the tests.

I am willing to live with it, knowing that the gix (CLI) program doesn't exhibit this behaviour, despite or because of installing signal handlers. It might be interesting to see what happens if these aren't installed - would it fail just like the nextest runs do while cloning a local resource, or is there yet another difference? If it works better, we might have an indication that nextest could do something differently to fix this, and if it's the same we would know it's just the way of the world.

@sunshowers
Copy link

Even though I don't know why exactly it happens, I don't think a systemic issue should be patched up on the side of the tests.

Ah I meant just for debugging. Agreed that that wouldn't be the right long term fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants