-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support long running multi tenant workflows #4133
Comments
So, at least 2 of these things are definitely already possible.
This is already possible with the existing APIs.
This can be done with a custom SpanProcessor and SpanExporter implementation. The collector side is definitely outside the scope of this project. If you need in-flight span support from the collector, please open up an issue in the specification for it.
I don't exactly understand what this means, but you can manage your Context instances however you like. Is there a specific feature you're looking for here from the context APIs? |
In a particular run of the job, it could process several in flight records checking to see if terminal state is reached or if new events can be recorded. I was hoping we could be able to switch the current span based on the currently processed record { sorry if using context there, caused some confusion }. This would allow us to record events like queuing delays, extra attributes like any errors observed in the Service provider delegate and so forth. |
Thanks for the clarification on the other two requests! I recently started exploring more about Open Telemetry. It's an awesome project and is changing the way i used to think about observability. |
I think this could be done pretty easily with some sort of Map of record -> span, but it's not something that the core APIs would probably support out of the box, unless it solves some sort of general problem (and went through the specification process to be added to the official APIs). |
A common pattern is to attach Context to a record if it goes through multiple stages. Maybe something along these lines pipeline.register(stage1, (event) -> {
Context context = Context.current().with(tracer.startSpan());
event.setAttribute(Context.class, context);
try (context.makeCurrent() {
logic()
}
});
pipeline.register(stage2, event -> {
Context context = event.getAttribute(Context.class);
try(context.makeCurrent()) {
logic()
}
}); This is just a hypothetical event processing API but shows patterns that are very common in most of them of being able to add arbitrary objects to an event or record. So you would just set it in the beginning and read it in each stage of processing. Maybe this provides some pointers for what you are trying to achieve. |
For us that would mean serializing the context and storing in db along with the record state and then deserializing and making it current. Thanks! that's a very nice idea. |
In general, workflow frameworks like stepFunctions, SWF, which provide this kind of cross task views in their UIs, often can be too heavy weight and expensive in some cases. For scenarios where our workflows are long running but just a linked list of steps, there could be an argument to provide the below out of the box:
For the second request, the approach suggested by @anuraaga makes a lot of sense and easily adoptable. |
I think this is the issue in the spec related to exporting in-progress spans open-telemetry/opentelemetry-specification#373 The SDK itself is ready for this as @jkwatson mentions, a SpanProcessor that exports on both onStart and onEnd would do the export. The backend would need to render the onStart data in a reasonable way and support mutation of that span which would happen with onEnd. Not sure if this is common among backends |
Gotcha. I just thought of another gap even in the existing workflow systems. While StepFunctions { the one i've worked with } provides great insights at a per workflow level, it still treats the step response as unstructured data and so we're unable to gain insights across workflows. Thus in multi tenant systems where each request results in a workflow for provisioning, deployments and other long running operations, it's not easy to get insights across workflows and if we want to measure things across workflow level, there seems to be a gap. For our restricted use case { each step is homogenous and does the exact same thing but on different inputs }, I'm thinking of leveraging ot metrics with attributes. Since our collector publishes in a backend that supports easy querying as well, I'm thinking of adopting this for now. |
Does anyone know of any backends that will support this? I know Jaeger doesn't. It will display both spans and add a warning to the second one, stating its a duplicate. |
Is your feature request related to a problem? Please describe.
We manage deployments as a series of stages with each stage being delegated to a service provider interface. Our processing layer is vert.x and backend is postgresql. We are trying to explore our visibility options for an inflight deployment workflow and we evaluated open telemetry and ran into a few open questions:
Describe the solution you'd like
Describe alternatives you've considered
Since each task within our deployment workflow is homogenous, metrics with attributes will be a great fit. But for a general purpose workflow management system, we'd be better off with native tracing system that allows users to monitor an inflight trace in the backend.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: