Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otelhttptrace: handle missing getconn hook without panic #5187

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tonistiigi
Copy link
Contributor

We have many reports that end() gets called without the span being defined in start() and causes a panic.

Ref moby/buildkit#4377
Ref docker/buildx#2232

I have not not fully debugged in what condition this happens in stdlib (I assume some keepalive pool case) but I don't see any other possible explanation for these panic cases.

Note that there are other httptrace hooks that also use .root without validation. I don't atm. have any proof that these could be called without GetConn() being called first as well so didn't add extra validation to these.

@tonistiigi tonistiigi requested a review from a team February 27, 2024 06:07
Copy link

codecov bot commented Feb 27, 2024

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 64.4%. Comparing base (669ca60) to head (ded1cd2).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #5187     +/-   ##
=======================================
- Coverage   64.5%   64.4%   -0.1%     
=======================================
  Files        200     200             
  Lines      12558   12560      +2     
=======================================
- Hits        8103    8101      -2     
- Misses      4219    4221      +2     
- Partials     236     238      +2     
Files Coverage Δ
...on/net/http/httptrace/otelhttptrace/clienttrace.go 79.0% <0.0%> (-0.8%) ⬇️

... and 1 file with indirect coverage changes

@dmathieu
Copy link
Member

This change only handles missing span in end() if useSpans is false.
It seems like that could also happen with the use of sub spans as well.

Also, I know this is hard to reproduce, and this package doesn't really have any tests at the moment, but would it be possible to write a test?

@tonistiigi
Copy link
Contributor Author

This change only handles missing span in end() if useSpans is false.
It seems like that could also happen with the use of sub spans as well.

What place/variable do you have in mind?

Also, I know this is hard to reproduce, and this package doesn't really have any tests at the moment, but would it be possible to write a test?

These calls are made from stdlib and I don't know the condition when the specific call order appears.

@pellared
Copy link
Member

These calls are made from stdlib and I don't know the condition when the specific call order appears.

I think the main usage of httptrace.ClientTrace is here: https://github.com/golang/go/blob/master/src/net/http/transport.go

@alessandrozucca-p
Copy link

alessandrozucca-p commented Jun 21, 2024

We started seeing this as well after introducing the option otelhttptrace.WithoutSubSpans()

The original implementation was taking locks at the beginning on the function:

func (ct *clientTracer) end(hook string, err error, attrs ...kv.KeyValue) {
ct.mtx.Lock()

defer ct.mtx.Unlock()

However since the introduction of if !ct.useSpans { the locks are not taken immediately.

It made me wonder, could this have cause some race condition?

@@ -226,6 +226,10 @@ func (ct *clientTracer) start(hook, spanName string, attrs ...attribute.KeyValue

func (ct *clientTracer) end(hook string, err error, attrs ...attribute.KeyValue) {
if !ct.useSpans {
// sometimes end may be called without previous start
if ct.root == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be tested?

Copy link

@xocasdashdash xocasdashdash Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @tonistiigi mentioned, they've gotten some reports, there's very little consistency on when this happens. We've seen very similar problems with clients very rarely blowing up with this same error.

I'm assuming this is going to be very hard to test because of this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A unit test doesn't have to be a full reproduction. It can call end() with conditions that reproduce this case, just to ensure the code path doesn't fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a new pull request #5965 which adds unit test for end() func

@dmathieu
Copy link
Member

This will need a changelog entry.

We have many reports that end() gets called without the
span being defined in start() and causes a panic.

Signed-off-by: Tonis Tiigi <[email protected]>
@tonistiigi tonistiigi force-pushed the httptrace-end-panic branch from d12420f to ded1cd2 Compare June 25, 2024 23:52
@tonistiigi
Copy link
Contributor Author

Added changelog

@krajorama
Copy link

Hi, just ran into moby/buildkit#4377 . Any update on this PR? I understand only unit test is missing?

dmathieu added a commit that referenced this pull request Sep 12, 2024
Fork of
#5187
updated with main branch and tests, this PR adds nil dereference check
for clientTracer.root in `end()` when span events are used instead of
sub spans

---------

Signed-off-by: Tonis Tiigi <[email protected]>
Co-authored-by: Tonis Tiigi <[email protected]>
Co-authored-by: Damien Mathieu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants