Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/logging: Fix data race on log_prefix #10568

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dsciebu
Copy link
Contributor

@dsciebu dsciebu commented Nov 21, 2024

In ofi_get_core_info, which is supposed to be thread safe ("Multiple threads may call fi_getinfo simultaneously, without any requirement for serialization."), a global variable 'log_prefix' is modified, which may lead to a data race. Changing the variable to a thread local one, fixes that problem.

Signed-off-by: Dariusz Sciebura [email protected]

@shefty
Copy link
Member

shefty commented Nov 22, 2024

Please change the commit message heading to: core/logging: Fix race..., and describe what the race is in the commit message.

@dsciebu dsciebu changed the title Fix datarace by making the common resource log_prefix thread local core/logging: Fix datarace by making the common resource log_prefix thread local Nov 27, 2024
@dsciebu dsciebu changed the title core/logging: Fix datarace by making the common resource log_prefix thread local core/logging: Fix datarace on log_prefix Nov 27, 2024
@dsciebu dsciebu changed the title core/logging: Fix datarace on log_prefix core/logging: Fix data race on log_prefix Nov 27, 2024
@dsciebu
Copy link
Contributor Author

dsciebu commented Nov 27, 2024

Please change the commit message heading to: core/logging: Fix race..., and describe what the race is in the commit message.

Done!

@shefty
Copy link
Member

shefty commented Nov 27, 2024

@dsciebu - You updated the github comment and title (as seen on the web page), but not commit message (as seen by git log). Both are useful, but we need the details in the commit message, as that is what gets added to git and visible via git log.

The changes themselves look good -- thanks!

In ofi_get_core_info, which is supposed to be thread safe ("Multiple
threads may call fi_getinfo simultaneously, without any requirement for
serialization."), a global variable 'log_prefix' is modified, which may
lead to a data race. Changing the variable to a thread local one, fixes
that problem.

Signed-off-by: Dariusz Sciebura <[email protected]>
@dsciebu dsciebu force-pushed the dariuszs/thread_local branch from 846c64d to 6935b70 Compare November 28, 2024 06:21
@dsciebu
Copy link
Contributor Author

dsciebu commented Nov 28, 2024

@dsciebu - You updated the github comment and title (as seen on the web page), but not commit message (as seen by git log). Both are useful, but we need the details in the commit message, as that is what gets added to git and visible via git log.

The changes themselves look good -- thanks!

I am used to the scenario, where the commits are squashed after merging and PR's title/comment is used to prepare the new commit. Anyway, I refreshed the commit name/desc as you wished.

@dsciebu
Copy link
Contributor Author

dsciebu commented Dec 17, 2024

Hey guy, friendly reminder - can you rereview and merge my change?

@piotrchmiel
Copy link
Contributor

bot:aws:retest

@piotrchmiel
Copy link
Contributor

cc: @shefty

@aingerson
Copy link
Contributor

@dsciebu Looks like _Thread_local doesn't play nicely with Windows. We will have to figure out what the comparable Windows solution is. Looks like you have to modify it in the caller which is annoying. See https://learn.microsoft.com/en-us/windows/win32/procthread/using-thread-local-storage
Most likely you'll have to modify the osd.h files for linux and windows and create a middle ground between the two where the declaration is _Thread_local for linux and empty for Windows and then adhere to the Windows use flow but have that empty for Linux. Very annoying... Though maybe there's a simpler solution. I haven't done a ton of digging on it.
Let me know if you want assistance with getting a solution going!

@shefty
Copy link
Member

shefty commented Dec 17, 2024

It looks like _Thread_local is defined in C11, but then deprecated in C23. thread_local is defined for C23. I found one reference where MS uses __declspec(thread).

So... it looks like there needs to be some sort of abstraction for this, possibly set based on the platform, but maybe also the compiler?

@aingerson
Copy link
Contributor

@shefty Yeah... this is going to get messy. Woof

@piotrchmiel
Copy link
Contributor

piotrchmiel commented Dec 18, 2024

@shefty @aingerson
Do you have a specific place where a series of #ifdefs based on the operating system, compiler, or language standard could be placed for thread_local ? If not, would it be a good idea to create a header file, such as ofi_thread_local.h, in the folder https://github.com/ofiwg/libfabric/blob/e5fe96ed96df94f5dca21f24662cd60bfee9af51/include? If think that something like that should work:

#ifndef OFI_THREAD_LOCAL_H
#define OFI_THREAD_LOCAL_H

#ifdef __cplusplus
extern "C" {
#endif

#if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 202311L)
    // C23 and above: use thread_local directly
    #define OFI_THREAD_LOCAL thread_local
#elif defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 201102L) && defined(_Thread_local)
    // C11: use _Thread_local
    #define OFI_THREAD_LOCAL _Thread_local
#elif defined(__GNUC__) || defined(__INTEL_COMPILER) || defined(__SUNPRO_CC) || defined(__IBMCPP__) || defined(__clang__)
    // GCC/Clang/Intel/SunPro/IBM compilers
    #define OFI_THREAD_LOCAL __thread
#elif defined(_MSC_VER)
    // Microsoft Visual C++ compiler
    #define OFI_THREAD_LOCAL __declspec(thread)
#else
    // Fallback: Unsupported compiler
    #error "Thread-local storage is not supported on this platform"
#endif

#ifdef __cplusplus
}
#endif

#endif // OFI_THREAD_LOCAL_H

@aingerson
Copy link
Contributor

@piotrchmiel You'll need to separate the Windows and Unix definitions and put them in their appropriate osd.h headers (see the include/ directory and the unix/ and windows/). But then something similar should work. Depending on the availability of the option, I would remove the #error and replace it with a warning/empty define (or if you want to get spicy, then a software implementation for it)

@shefty
Copy link
Member

shefty commented Dec 18, 2024

The race impacts debug log messages only. I would simply live with the race by default. Definitely don't error

@aingerson
Copy link
Contributor

I would at minimum add a compile warning/message and comment about it. Yes, this issue only applies to debug log messages, but anyone could see and call the thread local implementation for something else and think it will protect their code and be very disappointed :(

@dsciebu
Copy link
Contributor Author

dsciebu commented Dec 19, 2024

I would at minimum add a compile warning/message and comment about it. Yes, this issue only applies to debug log messages, but anyone could see and call the thread local implementation for something else and think it will protect their code and be very disappointed :(

Besides that: letting the data race impact debug mode leads to covering other problems when run with sanitizers (tsan catches data race here and stops - it's then useless for the real problem examination).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants