RFC: far fewer types (but comparable portability) #40

bhs · 2016-01-24T04:48:02Z

This is not for merging as-is – the standardtracer package is a mess, for example – but is definitely worth looking at and thinking about.

The goal here was to experiment with whether some of the formality around TraceContext could be removed from the API entirely. In my eyes, TraceContext (or SpanContext if we were to call it that -- whatever) is a worthwhile concept for an OpenTracing implementation, but I am not so sure it needs to leak out as an abstraction at the top level (as it has to date).

Also, this should be obvious: if we decide to go with this approach, it will affect all languages. I'm starting with Go because (a) Go makes it easy to think about interfaces without getting bogged down in boilerplate, and (b) there are working examples I could compile against as a sanity check.

I'll make some other comments inline.

Some people who might care about this: @yurishkuro @adriancole @michaelsembwever @slimsag @dankosaur @bcronin

bhs · 2016-01-24T04:55:08Z

README.md

    func xyz(goCtx context.Context, ...) {
        ...
-        sp, goCtx := opentracing.JoinTrace("span_name", goCtx).AddToGoContext(goCtx)
+        goCtx, sp := opentracing.ContextWithSpan(


FWIW, I was just tinkering with whether Span really needed that AddToGoContext helper method... the change here is not related to the main event in this PR.

yurishkuro · 2016-01-24T05:29:17Z

spanpropagator.go

+	PropagateSpanAsBinary(
+		sp Span,
+	) (
+		traceContextID []byte,


to bury the notion of "trace context" completely, we may want to refer to this argument as spanID

oh yeah, that was just an oversight. for sure.

requiredAttrs? bc if the intent is for these not to be dropped, but also fixed length, at least zipkin expects the same few attrs propagated. trace id, span id, parent id, and sampled. While we can argue about whether parent id can be sent, in reality the collectors make no attempt to look up a parent id when missing, and lack of parent id assumes the span is a root span. if we don't propagate sampled, we can royally muck up span trees as downstream might choose to not retain when the parent does, or visa versa.

Since we describe this as "core identifying information", implementations should be able to decide what that is. Ex. "user-id" certainly isn't, and even if some may not like what zipkin has as core propagation tags, they are crucial for bug prevention.

@adriancole @yurishkuro it's more than requiredAttrs... there are other "required" parts of a Span that don't get sent in-band during propagation (the operation name, the timing info, etc).

Maybe we should use language consistent with whatever we end up calling JoinTraceFrom(Binary|Text). It is borderline tautological, but the truest way to describe this field is that it's the minimum amount of state needed to join back to the trace on the other side of the propagation boundary.

(I went with

contextSnapshot []byte, traceAttrs []byte,

for now... happy to take suggestions!

yurishkuro · 2016-01-24T05:40:26Z

@bensigelman good job! I like this direction, makes the API simpler to use, and solves the "contains" vs. "identified by" conundrum.

There is one edge case that now won't work (the "debug" trace), but it wasn't very elegant in the previous version either, so we can think about it separately. Maybe once we extend the API to allow externally provided timestamps, the debug use case will be solved there as well.

+1

bhs · 2016-01-24T05:45:43Z

@yurishkuro, would the debug case be satisfied by a tag map made available at Span initialization time? Agreed that such a thing could probably be folded in to the forthcoming API for after-the-fact span recording...

yurishkuro · 2016-01-24T06:08:46Z

would the debug case be satisfied by a tag map made available at Span initialization time?

It could be, but as we've seen it also makes for an ugly api (in some languages), and the use case for pre-construction tags is pretty weak, so I wouldn't rush there.

codefromthecrypt · 2016-01-24T08:09:52Z

examples/dapperish.go

 		// Make sure that global trace tag propagation works.
-		span.TraceContext().SetTraceAttribute("User", os.Getenv("USER"))
+		span.SetTraceAttribute("User", os.Getenv("USER"))


don't mean to distract the issue, but I will 📦

I really like Tag Sets where propagation is simply a property of a tag. The "Trace" part of the name "TraceAttribute" has resulted in heaps of discussion, notably as no-one can guess by its name that it a propagation tag, but also the common "is it a part of the trace?" Ex. in zipkin trace id, id, parent id, and sampled decision are "trace attributes", ie only one field has anything to do with the trace.

I really would rather we be able to deconflate this whole thing by saying Span.setPropagationTag("User", os.Getenv("USER")), or worse but still better Span.setPropagationAttribute("User", os.Getenv("USER")). I have consistently found TraceAttribute an unnecessarily distracting term.

Yes. "TraceAttribute" implies something that applies to the root, or to all parent Spans as well as child Spans.

Another suggestion for naming:

Span.setCascadedTag("User", os.Getenv("USER"))

It's an attribute that child tags will also inherit, regardless of network/process boundaries.

I find the whole "Distributed Context Propagation" at the API level, has grown into an unneeded (psychological) schema.

To the user of the API it can be simplified just as an attribute the casades, even over the network/process boundaries.

( The "Distributed Context Propagation" schema is still valuable in Specification though. )

I, too, am unnerved by the naming around "Trace Attributes"... if we pursue the basic model of this PR, something like SetCascadedTag is worth considering seriously (since both the cascaded and "non-cascaded" tags apply to the same object now: the Span).

@adriancole @michaelsembwever @yurishkuro thoughts on something like the following three methods on Span?

SetTag(string, BasicType) SetCascadingTag(string, BasicType) Tag(string) --> string

Some problems right off the bat:

If "cascading tags" (trace attributes) are to be used as HTTP headers without bizarro escaping, we run headlong into the lengthy/complex caveats about casing, hyphens, etc. And it seems like the key restrictions for "plain" tags and cascading tags should be identical.

What to do if there's a "cascading" tag and a user calls SetTag with the same key? Who wins?

👍 -- Attribute vs Tag was a strange terminology distinction when I first started getting into this project; nothing unusual but the kind of naming that software is unfortunately rife with. So I think it can be clearer.

However, I'm not sure folding Attributes into Span Tags (via CascadingTag or similar) is going to solve the problem. A Tag is data added to a span, whereas an Attribute is data added to a Context. Based on that Attributes feel like they are not part of Span creation. We do want to propagate them, but they are not Span-centric.

It makes sense to think of the relevant part of DCP as propagating a Span when you have a zipkin-like model where client and server report data into the same Span, but when considering more generalized tracing systems, Context is the thing that is propagated, and Spans are created to report information on execution as the Context passes through. We can add Span info to the Context before propagating to inform the structure of the tree and Spans, but that doesn't make the Context a Span.

I realize this is arguably scope question currently being debated in #33 as well, but clarity is the goal of this line of discussion and I think it will be clearer if we don't conflate these two concepts.

(/me files a motion to decouple this important, unfinished discussion from the current PR)

codefromthecrypt · 2016-01-24T08:43:32Z

I like that we are calling this a propagation api vs a codec.. that's a better choice. There are some implementation glitches that this doesn't cause, but maybe just highlights better.

For example, span operationName often lives in code that doesn't have the responsibility of reading headers or thrift envelopes.

Also, I've made some comments that we should really say "core" attributes/propagated tags vs assuming they will be identifiers. Here are the core attributes in zipkin. Note that optional below doesn't mean it is optional to propagate. It means that it can be null.

  1: i64 trace_id,
  2: i64  span_id,
  3: optional i64 parent_span_id,
  5: optional bool sampled

https://github.com/twitter/finagle/blob/develop/finagle-thrift/src/main/thrift/tracing.thrift#L246

It might be the case that we called the above core fields "Id" as finagle calls it this. Personally, I think this is a very confusing choice of as the word ID doesn't in any way convey transport of flags.

TraceId(
  _traceId: Option[SpanId],
  _parentId: Option[SpanId],
  spanId: SpanId,
  _sampled: Option[Boolean],
  flags: Flags)

codefromthecrypt · 2016-01-24T09:01:13Z

Another late comment. It is probably acknowledged, but this api is what binds us or overlaps with other context apis. Ex. in finagle HttpContext.read is used to get a context which can also be used for ancillary things to tracing, such as deadlines. Maybe use-cases doc needs to enumerate that we don't expect to replace context apis with this, rather interact with them.

codefromthecrypt · 2016-01-24T09:33:35Z

Suggest we just contain the zipkin debug flag to finagle as it is rarely implemented outside it. I don't mind opening an issue to help change to a simpler policy of always reading the sampled flag. debug is very complex compared to sampled=0 or 1, and only really impacts the collection tier, which we assume is down-sampling. I made notes about this opentracing/opentracing.io#34 (comment)

bhs · 2016-01-25T05:26:52Z

Thanks, all, for the comments so far. Here are the important outstanding issues as I see them... in no particular order:

what to call and/or change about SetTraceAttribute (leave as-is? rename? combine with tags? what about the key restrictions?)
how to refer to the parts of a Span that need to get sent in-band during propagation
how we should allow people to set operation names after Spans have already started
whether/how to change method names around starting and joining traces

One other nice side-effect of this proposal: the hacks where we returned a child context and tags for the forthcoming Span are gone; even better, that Span does not even need to represent out-of-band identity fields like parent_id in the Span Tags.

Finally: nobody has leapt to the defense of TraceContext at this level of the API... seems like it's a goner.

bhs · 2016-01-25T23:53:44Z

Per opentracing/opentracing.io#24, I cleaned up the incredibly gross parts of this (i.e., standardtracer), fixed some comments, fixed tests, and would now like to merge this.

I recognize that we are not done talking about SetTraceAttribute, but I also think that can be resolved in parallel.

I'll add a few comments inline...

bhs · 2016-01-25T23:58:56Z

standardtracer/spanpropagator.go

+		panic(err)
+	}
+
+	// Handle the attributes.


FYI, I haven't tested this yet and so it probably has some subtle bug. I'll write a unittest if we agree that trace attrs are here to stay. :)

bhs · 2016-01-26T00:01:41Z

standardtracer/recorder.go

-// ProcessIdentifier is a thin interface that guarantees all implementors
-// represent a ProcessName and accepts arbitrary process-level tag assignment
-// (e.g., build numbers, platforms, hostnames, etc).
-type ProcessIdentifier interface {


I removed this since a tracing impl can add this if it feels like it, but ProcessIdentifier has no bearing on RawSpan or its direct dependencies so thus can be omitted here.

bhs · 2016-01-26T05:05:12Z

(this is now rebased against #39)

bhs · 2016-01-26T22:05:09Z

Per #40 (comment) , I think this is basically ready for merge unless someone wants to object. There are still some open issues, but they were not created by this PR (the trace attribute questions and whether we can consolidate span-creation mechanisms).

dkuebric · 2016-01-26T23:11:33Z

I think the biggest change here, as you point out, is the removal of TraceContext as an external interface. It's been seen as a stumbling block, but I think removing it introduces a strange confusion as well--a Span is a reportable observation representing execution within my application, but also I need to propagate it in order to report something somewhere else?

We currently have this definition of a Span, which seems good and clear:

A Span represents a logical unit of work in the system that has a start time and a duration.

Our definition of TraceContext is currently pretty complex, so I won't quote it here, but I think it boils down to:

A TraceContext provides identifying information that is associated with each Span reported, tying the entire trace together. It is propagated throughout the request, allowing it to carry not only Span identifying information, but also other Attributes added by an OT user.

Curious if anyone else shares this feeling (or if I'm an outlier)?

(PS -- I think this is pretty relevant to the discussion on #33 as well.)

bhs · 2016-01-26T23:29:19Z

Well, there is SpanPropagator which is responsible for getting spans across process boundaries... so in a sense it takes the place of TraceContext and the Codec and TraceContextSource pieces, all of which were only practically useful as (a) a place to stash the baggage / trace attrs, and (b) encapsulate the parts of a Span that propagate over the wire.

But my larger point is that there's still a type in this revised API that concerns itself with propagation. It's just not an encapsulated member of the Span type.

bhs · 2016-01-27T00:46:21Z

@dkuebric do you want to veto this or are you more "thinking out loud"?

Are there other dissenters to the core idea of removing TraceContext from the API?

If not, does someone want to LGTM this?

yurishkuro · 2016-01-27T02:25:34Z

It LGTM - other discussions can continue, but this doesn't make things worse imo, yet leaves less things to learn/worry about.

RFC: far fewer types (but comparable portability)

bhs reviewed Jan 24, 2016
View reviewed changes

bhs mentioned this pull request Jan 24, 2016

In discrete steps, Simplify the API opentracing/opentracing-java#7

Closed

yurishkuro reviewed Jan 24, 2016
View reviewed changes

codefromthecrypt reviewed Jan 24, 2016
View reviewed changes

codefromthecrypt mentioned this pull request Jan 24, 2016

Add a page illustrating common instrumentation use cases opentracing/opentracing.io#34

Merged

bhs mentioned this pull request Jan 25, 2016

consolidate the myriad interfaces into Tracer opentracing/opentracing.io#24

Closed

bhs reviewed Jan 25, 2016
View reviewed changes

bhs reviewed Jan 26, 2016
View reviewed changes

Ben Sigelman added 10 commits January 25, 2016 20:58

compiles and runs (but grossness everywhere in standardtracer)

7ba7985

one fewer interface since encode/decode weren't symmetric anyway

41f5970

better comments

1c71173

more cleanups

347c361

add SetOperationName(); remove an unneeded method;

56cca52

cleanups and file renames

cc02d3f

the ProcessIdentifier wasn't really needed; removed it.

eb53dfe

fix tests

27effd4

rename param

048371a

resolve merge conflicts post-#39

b8b5353

bhs force-pushed the fewer_types branch from 632ce54 to b8b5353 Compare January 26, 2016 05:04

minor comment fix

ca5c92c

bhs mentioned this pull request Jan 26, 2016

Merge Tracer.join_trace into Tracer.start_trace opentracing/opentracing-python#13

Closed

bhs mentioned this pull request Jan 27, 2016

remove TraceContext from the docs, too opentracing/opentracing.io#42

Merged

bhs pushed a commit that referenced this pull request Jan 27, 2016

Merge pull request #40 from opentracing/fewer_types

a33c13c

RFC: far fewer types (but comparable portability)

bhs merged commit a33c13c into master Jan 27, 2016

bhs deleted the fewer_types branch January 27, 2016 06:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: far fewer types (but comparable portability) #40

RFC: far fewer types (but comparable portability) #40

bhs commented Jan 24, 2016

bhs Jan 24, 2016

yurishkuro Jan 24, 2016

bhs Jan 24, 2016

codefromthecrypt Jan 24, 2016

bhs Jan 25, 2016

bhs Jan 25, 2016

yurishkuro commented Jan 24, 2016

bhs commented Jan 24, 2016

yurishkuro commented Jan 24, 2016

codefromthecrypt Jan 24, 2016

michaelsembwever Jan 24, 2016

bhs Jan 25, 2016

dkuebric Jan 25, 2016

bhs Jan 25, 2016

codefromthecrypt commented Jan 24, 2016

codefromthecrypt commented Jan 24, 2016

codefromthecrypt commented Jan 24, 2016 via email

bhs commented Jan 25, 2016

bhs commented Jan 25, 2016

bhs Jan 25, 2016

bhs Jan 26, 2016

bhs commented Jan 26, 2016

bhs commented Jan 26, 2016

dkuebric commented Jan 26, 2016

bhs commented Jan 26, 2016

bhs commented Jan 27, 2016

yurishkuro commented Jan 27, 2016

RFC: far fewer types (but comparable portability) #40

RFC: far fewer types (but comparable portability) #40

Conversation

bhs commented Jan 24, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurishkuro commented Jan 24, 2016

bhs commented Jan 24, 2016

yurishkuro commented Jan 24, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codefromthecrypt commented Jan 24, 2016

codefromthecrypt commented Jan 24, 2016

codefromthecrypt commented Jan 24, 2016 via email

bhs commented Jan 25, 2016

bhs commented Jan 25, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhs commented Jan 26, 2016

bhs commented Jan 26, 2016

dkuebric commented Jan 26, 2016

bhs commented Jan 26, 2016

bhs commented Jan 27, 2016

yurishkuro commented Jan 27, 2016