-
Notifications
You must be signed in to change notification settings - Fork 316
RFC: far fewer types (but comparable portability) #40
Conversation
func xyz(goCtx context.Context, ...) { | ||
... | ||
sp, goCtx := opentracing.JoinTrace("span_name", goCtx).AddToGoContext(goCtx) | ||
goCtx, sp := opentracing.ContextWithSpan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I was just tinkering with whether Span
really needed that AddToGoContext
helper method... the change here is not related to the main event in this PR.
PropagateSpanAsBinary( | ||
sp Span, | ||
) ( | ||
traceContextID []byte, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to bury the notion of "trace context" completely, we may want to refer to this argument as spanID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah, that was just an oversight. for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requiredAttrs? bc if the intent is for these not to be dropped, but also fixed length, at least zipkin expects the same few attrs propagated. trace id, span id, parent id, and sampled. While we can argue about whether parent id can be sent, in reality the collectors make no attempt to look up a parent id when missing, and lack of parent id assumes the span is a root span. if we don't propagate sampled, we can royally muck up span trees as downstream might choose to not retain when the parent does, or visa versa.
Since we describe this as "core identifying information", implementations should be able to decide what that is. Ex. "user-id" certainly isn't, and even if some may not like what zipkin has as core propagation tags, they are crucial for bug prevention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adriancole @yurishkuro it's more than requiredAttrs
... there are other "required" parts of a Span that don't get sent in-band during propagation (the operation name, the timing info, etc).
Maybe we should use language consistent with whatever we end up calling JoinTraceFrom(Binary|Text)
. It is borderline tautological, but the truest way to describe this field is that it's the minimum amount of state needed to join back to the trace on the other side of the propagation boundary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I went with
contextSnapshot []byte,
traceAttrs []byte,
for now... happy to take suggestions!
@bensigelman good job! I like this direction, makes the API simpler to use, and solves the "contains" vs. "identified by" conundrum. There is one edge case that now won't work (the "debug" trace), but it wasn't very elegant in the previous version either, so we can think about it separately. Maybe once we extend the API to allow externally provided timestamps, the debug use case will be solved there as well. +1 |
@yurishkuro, would the debug case be satisfied by a tag map made available at Span initialization time? Agreed that such a thing could probably be folded in to the forthcoming API for after-the-fact span recording... |
It could be, but as we've seen it also makes for an ugly api (in some languages), and the use case for pre-construction tags is pretty weak, so I wouldn't rush there. |
// Make sure that global trace tag propagation works. | ||
span.TraceContext().SetTraceAttribute("User", os.Getenv("USER")) | ||
span.SetTraceAttribute("User", os.Getenv("USER")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't mean to distract the issue, but I will 📦
I really like Tag Sets where propagation is simply a property of a tag. The "Trace" part of the name "TraceAttribute" has resulted in heaps of discussion, notably as no-one can guess by its name that it a propagation tag, but also the common "is it a part of the trace?" Ex. in zipkin trace id, id, parent id, and sampled decision are "trace attributes", ie only one field has anything to do with the trace.
I really would rather we be able to deconflate this whole thing by saying Span.setPropagationTag("User", os.Getenv("USER"))
, or worse but still better Span.setPropagationAttribute("User", os.Getenv("USER"))
. I have consistently found TraceAttribute an unnecessarily distracting term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. "TraceAttribute" implies something that applies to the root, or to all parent Spans as well as child Spans.
Another suggestion for naming:
Span.setCascadedTag("User", os.Getenv("USER"))
It's an attribute that child tags will also inherit, regardless of network/process boundaries.
I find the whole "Distributed Context Propagation" at the API level, has grown into an unneeded (psychological) schema.
To the user of the API it can be simplified just as an attribute the casades, even over the network/process boundaries.
( The "Distributed Context Propagation" schema is still valuable in Specification though. )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I, too, am unnerved by the naming around "Trace Attributes"... if we pursue the basic model of this PR, something like SetCascadedTag
is worth considering seriously (since both the cascaded and "non-cascaded" tags apply to the same object now: the Span).
@adriancole @michaelsembwever @yurishkuro thoughts on something like the following three methods on Span?
SetTag(string, BasicType)
SetCascadingTag(string, BasicType)
Tag(string) --> string
Some problems right off the bat:
- If "cascading tags" (trace attributes) are to be used as HTTP headers without bizarro escaping, we run headlong into the lengthy/complex caveats about casing, hyphens, etc. And it seems like the key restrictions for "plain" tags and cascading tags should be identical.
- What to do if there's a "cascading" tag and a user calls
SetTag
with the same key? Who wins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 -- Attribute
vs Tag
was a strange terminology distinction when I first started getting into this project; nothing unusual but the kind of naming that software is unfortunately rife with. So I think it can be clearer.
However, I'm not sure folding Attributes
into Span Tags
(via CascadingTag
or similar) is going to solve the problem. A Tag
is data added to a span, whereas an Attribute
is data added to a Context
. Based on that Attributes
feel like they are not part of Span
creation. We do want to propagate them, but they are not Span
-centric.
It makes sense to think of the relevant part of DCP
as propagating a Span
when you have a zipkin-like model where client and server report data into the same Span
, but when considering more generalized tracing systems, Context
is the thing that is propagated, and Span
s are created to report information on execution as the Context
passes through. We can add Span
info to the Context
before propagating to inform the structure of the tree and Span
s, but that doesn't make the Context
a Span
.
I realize this is arguably scope question currently being debated in #33 as well, but clarity is the goal of this line of discussion and I think it will be clearer if we don't conflate these two concepts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(/me files a motion to decouple this important, unfinished discussion from the current PR)
I like that we are calling this a propagation api vs a codec.. that's a better choice. There are some implementation glitches that this doesn't cause, but maybe just highlights better. For example, span operationName often lives in code that doesn't have the responsibility of reading headers or thrift envelopes. Also, I've made some comments that we should really say "core" attributes/propagated tags vs assuming they will be identifiers. Here are the core attributes in zipkin. Note that optional below doesn't mean it is optional to propagate. It means that it can be null.
https://github.com/twitter/finagle/blob/develop/finagle-thrift/src/main/thrift/tracing.thrift#L246 It might be the case that we called the above core fields "Id" as finagle calls it this. Personally, I think this is a very confusing choice of as the word ID doesn't in any way convey transport of flags.
|
Another late comment. It is probably acknowledged, but this api is what binds us or overlaps with other context apis. Ex. in finagle |
Suggest we just contain the zipkin debug flag to finagle as it is rarely
implemented outside it. I don't mind opening an issue to help change to a
simpler policy of always reading the sampled flag.
debug is very complex compared to sampled=0 or 1, and only really impacts
the collection tier, which we assume is down-sampling.
I made notes about this
opentracing/opentracing.io#34 (comment)
|
Thanks, all, for the comments so far. Here are the important outstanding issues as I see them... in no particular order:
One other nice side-effect of this proposal: the hacks where we returned a child context and tags for the forthcoming Span are gone; even better, that Span does not even need to represent out-of-band identity fields like Finally: nobody has leapt to the defense of TraceContext at this level of the API... seems like it's a goner. |
Per opentracing/opentracing.io#24, I cleaned up the incredibly gross parts of this (i.e., I recognize that we are not done talking about I'll add a few comments inline... |
panic(err) | ||
} | ||
|
||
// Handle the attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I haven't tested this yet and so it probably has some subtle bug. I'll write a unittest if we agree that trace attrs are here to stay. :)
// ProcessIdentifier is a thin interface that guarantees all implementors | ||
// represent a ProcessName and accepts arbitrary process-level tag assignment | ||
// (e.g., build numbers, platforms, hostnames, etc). | ||
type ProcessIdentifier interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this since a tracing impl can add this if it feels like it, but ProcessIdentifier has no bearing on RawSpan or its direct dependencies so thus can be omitted here.
(this is now rebased against #39) |
Per #40 (comment) , I think this is basically ready for merge unless someone wants to object. There are still some open issues, but they were not created by this PR (the trace attribute questions and whether we can consolidate span-creation mechanisms). |
I think the biggest change here, as you point out, is the removal of We currently have this definition of a
Our definition of
Curious if anyone else shares this feeling (or if I'm an outlier)? (PS -- I think this is pretty relevant to the discussion on #33 as well.) |
Well, there is But my larger point is that there's still a type in this revised API that concerns itself with propagation. It's just not an encapsulated member of the Span type. |
@dkuebric do you want to veto this or are you more "thinking out loud"? Are there other dissenters to the core idea of removing TraceContext from the API? If not, does someone want to LGTM this? |
It LGTM - other discussions can continue, but this doesn't make things worse imo, yet leaves less things to learn/worry about. |
RFC: far fewer types (but comparable portability)
This is not for merging as-is – the
standardtracer
package is a mess, for example – but is definitely worth looking at and thinking about.The goal here was to experiment with whether some of the formality around
TraceContext
could be removed from the API entirely. In my eyes,TraceContext
(orSpanContext
if we were to call it that -- whatever) is a worthwhile concept for an OpenTracing implementation, but I am not so sure it needs to leak out as an abstraction at the top level (as it has to date).Also, this should be obvious: if we decide to go with this approach, it will affect all languages. I'm starting with Go because (a) Go makes it easy to think about interfaces without getting bogged down in boilerplate, and (b) there are working examples I could compile against as a sanity check.
I'll make some other comments inline.
Some people who might care about this: @yurishkuro @adriancole @michaelsembwever @slimsag @dankosaur @bcronin