-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose metric for log export failure #6709 #6779
base: main
Are you sure you want to change the base?
Expose metric for log export failure #6709 #6779
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6779 +/- ##
============================================
+ Coverage 90.10% 90.11% +0.01%
- Complexity 6541 6542 +1
============================================
Files 728 728
Lines 19695 19703 +8
Branches 1935 1935
============================================
+ Hits 17746 17756 +10
+ Misses 1349 1347 -2
Partials 600 600 ☔ View full report in Codecov by Sentry. |
@@ -197,6 +199,12 @@ private Worker( | |||
"The number of logs processed by the BatchLogRecordProcessor. " | |||
+ "[dropped=true if they were dropped due to high throughput]") | |||
.build(); | |||
logsExportFailureCounter = | |||
meter | |||
.counterBuilder("logsExportFailure") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this seems like a small change, I'm reluctant to make it because there have been some attempts to standardize the SDKs' internal telemetry (e.g. OTEP#238).
The problem with continuing the pattern of these current metrics is that the structure doesn't conform to our semantic convention recommendations.
- The unit is wrong - should probably be
{export}
instead of1
- The metric name doesn't include a namespace
- The attributes don't have a namespace
Extending the instrumentation extends bad patterns. Fixing the bad patterns exposes our users to breaking changes, only to have more later if / when semantic conventions emerge. So we appear to be stuck. I'll bring it up at next week's java SIG to see if can reach any conclusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jack-berg agree with you here that current pattern (existing as well as any proposed metric in future) doesn't conform to semantic recommendations like metric name having namespace, well defined units, etc.
So we are stuck between extending new instrumentations/ rectifying' the existing instrumentations with bad semantics AND the recommended ones. Do let us know how the discussions go with this. As this will be applicable in general, not just here.
@@ -197,6 +199,12 @@ private Worker( | |||
"The number of logs processed by the BatchLogRecordProcessor. " | |||
+ "[dropped=true if they were dropped due to high throughput]") | |||
.build(); | |||
logsExportFailureCounter = | |||
meter | |||
.counterBuilder("logsExportFailure") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OTLP exporters already have dedicated metrics to track failures: https://github.com/open-telemetry/opentelemetry-java/blob/main/exporters/common/src/main/java/io/opentelemetry/exporter/internal/ExporterMetrics.java
Would these serve your needs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jack-berg Thanks. I think this does address the requirement. I tried finding if something already exists for exporter in general, as this is a generic need for any kind of exporter not just BatchLogExporter.
I enabled 'OTEL_EXPORTER_METRICS_ENABLED' and got this output. Let me check with the original reporter of the issue.
ScopeMetrics #2
ScopeMetrics SchemaURL:
InstrumentationScope io.opentelemetry.exporters.otlp-grpc
Metric #0
Descriptor:
-> Name: otlp.exporter.exported
-> Description:
-> Unit:
-> DataType: Sum
-> IsMonotonic: true
-> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
-> success: Bool(false)
-> type: Str(log)
StartTimestamp: 2024-10-14 10:06:54.40763 +0000 UTC
Timestamp: 2024-10-14 10:10:54.418291 +0000 UTC
Value: 9
Metric #1
Descriptor:
-> Name: otlp.exporter.seen
-> Description:
-> Unit:
-> DataType: Sum
-> IsMonotonic: true
-> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
-> type: Str(log)
StartTimestamp: 2024-10-14 10:06:54.40763 +0000 UTC
Timestamp: 2024-10-14 10:10:54.418291 +0000 UTC
Value: 9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These metrics should be enabled by default if using autoconfigure. Note if not using autoconfigure, you need to carefully order the initialization so that the configured meter provider can be passed to the OTLP exporters for spans and logs to collect internal telemetry.
I'm not sure what OTEL_EXPORTER_METRICS_ENABLED
is a reference to. Its not a property that's used in this repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I didn't backup the entire collector logs and misinterpreted that these metrics need to be enabled. These are present by default.
Ran in local, otel collector log shows the metric as following: