Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove is_shutdown flag from processors. And fix logger::emit() to check for the flag before emit. #2462

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

lalitb
Copy link
Member

@lalitb lalitb commented Dec 20, 2024

Changes

To discuss the change suggested be @cijothomas here - #2381 (comment)

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@@ -268,6 +268,14 @@ impl opentelemetry::logs::Logger for Logger {

/// Emit a `LogRecord`.
fn emit(&self, mut record: Self::LogRecord) {
if self.provider.inner.is_shutdown.load(Ordering::Relaxed) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with this is, this can affect throughput due to the contention introduced here. (Logs so far has no contention when using etw/user_events)....
Can you check stress test before and after?

I am unsure of a solution. Maybe don't check shutdown anywhere except in stdout like non-prod processors, and rely on mechanisms like export client/ etw etc. returning errors..
what harm can it cause 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, it doesn’t make sense to introduce the contention in the hot path, even if the contention is at atomic level.

Copy link
Contributor

@utpilla utpilla Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are only reading the atomic variable and the variable's value does not change for the most part of application lifetime, it most likely should not have any visible effect on the throughput.

We should be able to confirm that with the stress test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the effect of uncontested atomic would be negligible enough to be noticed over stress test. It is slightly more than normal bool-check, but much less than the perf associated in case of contention. I added the benchmark over logger:emit() in this PR, which shows the latency of 1-2ns:
main branch:

logger_emit             time:   [37.916 ns 37.977 ns 38.077 ns]
                        change: [-0.2733% -0.0648% +0.1328%] (p = 0.58 > 0.05)
                        No change in performance detected.

PR branch:

logger_emit             time:   [38.941 ns 39.027 ns 39.172 ns]
                        change: [+2.6756% +2.9861% +3.3292%] (p = 0.00 < 0.05)
                        Performance has regressed.

Copy link
Member Author

@lalitb lalitb Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure of a solution. Maybe don't check shutdown anywhere except in stdout like non-prod processors, and rely on mechanisms like export client/ etw etc. returning errors..
what harm can it cause

The custom exporter which can be connected to reentrant and simple processor need to handle the shutdown properly in that case. As of now, etw and user_events don't do anything, so they will continue to emit even after shutdown invoked by user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the effect of uncontested atomic would be negligible enough to be noticed over stress test.

@lalitb Just to confirm, you agree that the change in this PR (reading is_shutdown) is not introducing any contention, right?

Copy link
Member Author

@lalitb lalitb Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for uncontested relaxed atomic read for AtomicBool, the cost is close to regular bool read - I observed latency of 1-2 ns by adding this check, which I believe can be acceptable. Just to add, we have this check for the spans too, when they are ended.

@@ -201,15 +187,6 @@ impl Debug for BatchLogProcessor {

impl LogProcessor for BatchLogProcessor {
fn emit(&self, record: &mut LogRecord, instrumentation: &InstrumentationScope) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if an emit is called after shutdown, even if we don't do is_shutdown check, the channel would be already closed, so the error it returned is good enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like

  1. Don't check for is_shutdown.
  2. Just export as usual.
  3. Since the channel is closed, it'll error out.
  4. Log that error.

No contention/perf cost for normal path. If logs are still emitted after shutdown, it clearly indicates some issue with user managing the lifetimes.

Copy link

codecov bot commented Dec 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.9%. Comparing base (6209c06) to head (5028be3).

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #2462   +/-   ##
=====================================
  Coverage   76.9%   76.9%           
=====================================
  Files        123     123           
  Lines      22581   22548   -33     
=====================================
- Hits       17379   17359   -20     
+ Misses      5202    5189   -13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants