Timeout used by park_timeout seems way too short, causes heavy CPU use #5

hutch · 2016-08-03T11:48:23Z

I have a small test program (implementing a component of a real application) using Bus that was idling at about 100-110% CPU utilization (1+ cores of 8), which is excessive.

When I changed the timeouts used by the two calls to park_timeout to be 1ms (Duration::from_millis(1)) rather than 1us currently in the official release, the CPU utilization under a pretty realistic but on the heavy side real-world load (not just idle, and including some computation) fell to about 3.5%.

I think this 3.5% CPU utilization is pretty impressive. Nice library you've got there :-)

The text was updated successfully, but these errors were encountered:

jonhoo · 2016-08-04T10:04:14Z

I agree. The goal is to switch to adaptive backoff spinning, but I'd prefer to not implement it myself. I'm half-waiting for mvdnes/spin-rs#29 to be fixed, and then simply use the SpinWait that library would then expose. I don't think I'd want to change the static wait time to 1ms, as that means increasing the expected wait time for very busy applications fairly significantly. If you want to give writing an exponential back-off system a go though, I'd be happy to look at a PR!

Also, thanks :)

arthurprs · 2016-08-15T14:16:58Z

Is this really fixable by backoff spinning? I mean, spinning burns CPU instead of properly giving away the thread time slice.

I think the proper fix would be to use some sort of CondVar or other real parking/unparking support.

jonhoo · 2016-08-22T08:57:06Z

Given that there aren't any locks, CondVar won't work. This is also the reason why we need to use park_timeout instead of just park (though note that we are using park/unpark), as there are races where the other peer won't be woken up. Notice that we're not using a spin_lock_, it's more like we are implementing CondVar ourselves. Spinning initially, and then parking for increasingly long times is a pretty sensible approach (parking_lot does something similar).

This may fix the concerns raised in #5.

jonhoo · 2016-08-29T19:47:57Z

@hutch can you give it a try with the new SpinWait now in place?

bbigras · 2016-09-01T18:48:39Z

I think I have the high cpu usage problem with 1.1.0.

jonhoo · 2016-09-01T20:36:26Z

@BrunoQC Try 1.1.1?

bbigras · 2016-09-01T21:02:59Z

It seems better. The average is about 15% and it peaks at about 30%. With 1.1.0 I had peaks at about 70%.

Do you think the cpu usage could be even lower?

jonhoo · 2016-09-01T21:10:21Z

Well, it depends what you're doing really. If you're sending and receiving in a tight loop, I'd expect it to hit 100%. If you're doing other compute-heavy work, I'd expect the overhead of bus to be marginal.

xobs · 2017-03-02T06:57:57Z

Using bus 1.1.3 on a Raspberry Pi, each thread doing a recv() uses 15-35% CPU with the parking time set to 1 uS. If I set it to 1 mS, I have trouble measuring the usage because it's so low.

jonhoo · 2017-03-02T16:13:53Z

@xobs I'm confused -- the default parking time in bus now is 100µs (thread::park_timeout(Duration::new(0, 100000)); 100000ns = 100µs). Your diff (xobs@c8bb287) is using 100000000ns = 100000µs = 100ms.

xobs · 2017-03-03T02:33:59Z

@jonhoo You're right, I set it to 100ms instead of 1ms.

With the default 100µs value, a single instance of waiting in BusReader.read() burns 16% according to "top -H". If I add more BusReaders then the program's usage goes up with each instance.

The accidental high-latency in my own patch is actually fine for my purposes where, but I'll check what happens when I set it to 1mS instead of 100mS.

xobs · 2017-03-06T04:52:52Z

With the default of 100 uS, the output of "top -H" looks like this with three listeners (in a thread called B-Hook):

  688 pi        20   0   46568  13964   5780 S 16.2  1.4   0:04.48 B-Hook
  683 pi        20   0   46568  13964   5780 S 15.8  1.4   0:04.48 B-Hook
  685 pi        20   0   46568  13964   5780 S 15.5  1.4   0:04.49 B-Hook
  677 pi        20   0    8604   3380   2872 R  1.3  0.3   0:00.44 top

There are four cores on this system, so each thread is being assigned one core. The bus is idle, so they are burning that while simply polling for data.

If I correct my patch and set the poll time to 1 mS, output looks like the following:

 1405 pi        20   0   40424   6864   5732 S  2.3  0.7   0:00.30 B-Hook
 1407 pi        20   0   40424   6864   5732 S  2.3  0.7   0:00.31 B-Hook
 1410 pi        20   0   40424   6864   5732 S  2.3  0.7   0:00.27 B-Hook
  677 pi        20   0    8604   3380   2872 R  1.0  0.3   0:03.42 top

Each instance of bus.recv() is still the busiest thing on the system, but it's much better now.

I'd like to be able to increase the value to 10 mS or even up to 100 mS, but 100 uS is definitely using too much power.

jonhoo · 2017-03-06T04:58:53Z

So, the reason I'm hesitant to raise the timeout is that, as the comment above alludes to, the timeout dictates how long after a free slot opens up a thread might notice that this happened. Or, to put it differently, with a timeout of 100ms, a reader or writer may, in particular racy conditions, block on a read or write 100ms after it should have unblocked and continued. That's a long time for performance critical systems... I guess one thing to do here would be to make the timeout configurable per Bus so you'd have some control over it.. @hutch @BrunoQC thoughts about this?

jonhoo added a commit that referenced this issue Aug 29, 2016

Use SpinWait to avoid busy-parking

d03547b

This may fix the concerns raised in #5.

jonhoo pushed a commit that referenced this issue Sep 9, 2023

chore: automatically cancel superseded Actions runs (#5)

16a2c29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout used by park_timeout seems way too short, causes heavy CPU use #5

Timeout used by park_timeout seems way too short, causes heavy CPU use #5

hutch commented Aug 3, 2016

jonhoo commented Aug 4, 2016

arthurprs commented Aug 15, 2016

jonhoo commented Aug 22, 2016

jonhoo commented Aug 29, 2016

bbigras commented Sep 1, 2016

jonhoo commented Sep 1, 2016

bbigras commented Sep 1, 2016

jonhoo commented Sep 1, 2016

xobs commented Mar 2, 2017

jonhoo commented Mar 2, 2017 •

edited

Loading

xobs commented Mar 3, 2017

xobs commented Mar 6, 2017 •

edited

Loading

jonhoo commented Mar 6, 2017

Timeout used by park_timeout seems way too short, causes heavy CPU use #5

Timeout used by park_timeout seems way too short, causes heavy CPU use #5

Comments

hutch commented Aug 3, 2016

jonhoo commented Aug 4, 2016

arthurprs commented Aug 15, 2016

jonhoo commented Aug 22, 2016

jonhoo commented Aug 29, 2016

bbigras commented Sep 1, 2016

jonhoo commented Sep 1, 2016

bbigras commented Sep 1, 2016

jonhoo commented Sep 1, 2016

xobs commented Mar 2, 2017

jonhoo commented Mar 2, 2017 • edited Loading

xobs commented Mar 3, 2017

xobs commented Mar 6, 2017 • edited Loading

jonhoo commented Mar 6, 2017

jonhoo commented Mar 2, 2017 •

edited

Loading

xobs commented Mar 6, 2017 •

edited

Loading