Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support receiving duplicate spans keeping only the latest #3790

Closed
adelel1 opened this issue Jun 30, 2022 · 1 comment
Closed

Support receiving duplicate spans keeping only the latest #3790

adelel1 opened this issue Jun 30, 2022 · 1 comment

Comments

@adelel1
Copy link

adelel1 commented Jun 30, 2022

Requirement - what kind of business use case are you trying to solve?

I am using Jaeger backend with an opentelemetry collector, sending spans to the backend.

I represent an operation with a span, operations can call other operations on different machines, to an arbitrary level. When all operations complete I can a nice tree of spans. However operations can sometimes get stuck. I would like to use opentelemetry to pinpoint where the stuck operation is. When an operation is stuck, the span is not complete, the parent span therefore is also not complete and so on until I get to the root span which is not complete.

If I could see in progress spans, then I could quickly pinpoint the stuck operation that is causing the root operation to not complete.

Problem - what in Jaeger blocks you from solving the requirement?

I can write my own opentelemetry span processor which sends spans in the onStart() method of the processor. When these reach the backend these would be the in-progress spans.
In the spanprocessor i can also send the same span in the onEnd() method, when these reach the backend these would be the completed spans.

If i implement this the Jaeger UI shows two spans with the second span reporting a warning message: "duplicate span IDs; skipping clock skew adjustment"

Its treating the second span i send in the onEnd operation as a duplicate (which it is).

The impact is that I cannot use Jaeger and opentel to pinpoint problem operations across machines, where I have spans not complete.

Proposal - what do you suggest to solve the problem or improve the existing situation?

Could you allow duplicate spans. Such that if a new span arrives at the backend and which is a duplicate of one already received then clear the one already received and replace with the new one. Need to check timestamps incase they are received out of order, i.e. only store the latest span.

I would also want to add an attribute to the in-progress span (e.g. 'in-progress') and in the completed span remove this and replace with a 'completed' attribute. I would then want to search for 'in-progress' tag for a specific trace id. So I can see the problem spans for a trace id.
When the completed span comes in replace everything in the existing span including attributes etc.

Glad to hear any other potential solutions to the usecase.

Any open questions to address

For ref see: open-telemetry/opentelemetry-java#4133

@yurishkuro
Copy link
Member

Closing as a duplicate of #729

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants