You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This gathers points related to performance and functionality for benchmarking.
UChicago dev instance: FileNotFoundError when running over local files (reproducer)
UChicago prod instance: KilledWorker exception at scale (reproducer: default notebook with N_FILES_MAX_PER_SAMPLE>=1000)
scaling beyond ~50 workers at UNL
dask.distributed scaling behavior (reproducer below): differs per site, see occasionally long tails where the last few tasks do not get picked up for a long time, and workers only process a few tasks sometimes while there are still tasks remaining
basket size can matter a lot for uproot, 10-100 kB per basket is good (could try re-merging with hadd -O), small input files have 1 .num_baskets, 10->1 merged files have 10
where (by file size) W+jets is ~31% of all events available, ttbar nominal is ~40%, the four ttbar variations are ~17% together.
By number of events (948 M events total), the breakdown is 46% W+jets, 30% ttbar nominal, 12% ttbar variations, 11% single top t-channel.
some benchmarking calculations for the usual notebook:
print(f"\nexecution took {time_taken:.2f} seconds")
print(f"event rate / worker: {metrics['entries'] /NUM_WORKERS/time_taken/1000:.3f} kHz (including overhead, so pessimistic estimate)")
print(f"data read: {metrics['bytesread']/1000**3:.3f} GB")
print(f"events processed: {metrics['entries']/1000**2:.3f} M")
print(f"processtime: {metrics['processtime']:.3f} s (?!)")
print(f"processtime per worker: {metrics['processtime']/NUM_WORKERS:.3f} s (should be similar to real runtime, will be lower if opening files etc. is a significant contribution)")
print(f"processtime per chunk: {metrics['processtime']/metrics['chunks']:.3f} s")
print(metrics)
The text was updated successfully, but these errors were encountered:
This gathers points related to performance and functionality for benchmarking.
FileNotFoundError
when running over local files (reproducer)KilledWorker
exception at scale (reproducer: default notebook withN_FILES_MAX_PER_SAMPLE>=1000
)dask.distributed
scaling behavior (reproducer below): differs per site, see occasionally long tails where the last few tasks do not get picked up for a long time, and workers only process a few tasks sometimes while there are still tasks remaininguproot
, 10-100 kB per basket is good (could try re-merging withhadd -O
), small input files have 1.num_baskets
, 10->1 merged files have 10bytesread
bugbytesread
in metrics varies depending on file source and disagrees with pureuproot
scikit-hep/coffea#717things to add to notebook (-> #85):
points to follow up on (thanks @nsmith-!)
py-spy
for profiling https://github.com/benfred/py-spyif there is time:
related tooling:
rootreadspeed
: https://github.com/root-project/root/blob/master/tree/readspeed/README.mddask.distributed
setup for simple scaling tests:Event rate measurement: what to use as reference (events in input vs events passing selection)? For reference (10 input files per process):
where (by file size) W+jets is ~31% of all events available, ttbar nominal is ~40%, the four ttbar variations are ~17% together.
By number of events (948 M events total), the breakdown is 46% W+jets, 30% ttbar nominal, 12% ttbar variations, 11% single top t-channel.
some benchmarking calculations for the usual notebook:
The text was updated successfully, but these errors were encountered: