-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slurm Cluster CPU Affinity #1358
Comments
It's hard to say without additional information and I would suggest inquiring your cluster's admin, to me it seems that the "more memory" situation you're describing and the addition of Given If knowing the above you'd still like to try out launching a Dask-CUDA cluster you could try to disable setting affinity by commenting out the relevant |
Would a cluster still work without these lines? I was under the impression setting affinity was necessary. |
It will but may be slow. The primary purpose of setting CPU affinity in the context of Dask-CUDA is to ensure workers are running on the closest CPU(s) to each GPU, thus avoiding additional hops that will slow down the application. |
@pentschev What has your experience been with dask-cuda setting this affinity vs. not? What sort/magnitude of slowdown might you see if this is not set? Would an example like that from #1351 where there is a cpu-gpu transfer reveal performance differences? |
And then as a follow up, what would it mean for there to be no difference if I were to see that, in contrast to what you have seen if you have seen an improvement by setting this affinity? |
My topology:
|
This is the kind of problem that there's no good rule-of-thumb, as there are simply too many variables involved. Everything will depend on the topology, the type of compute and memory access patterns, as well as system load, PCIe bandwidth, etc. The best is to do what you did and measure it, I'm not surprised by 30% slowdown, it will most likely be noticeable in the majority of cases and in particular when there's more than one NUMA node involved, which is the case for you. |
Thanks @pentschev ! |
I think the conclusion here is that there is no concrete bug to take action on. If so @ilan-gold , please go ahead and close. |
Hello All,
This is more of a "seeking advice" than a bug, although who knows. So anyone with experience in this area would be welcome to chime in! The TLDR is that requesting high amounts of memory on a slurm cluster causes CPU device affinity (and NUMA affinity) to be incorrect in conjunction with GPUs.
When running
srun --pty -c 10 -p gpu_p --qos gpu_long --nice=0 --exclusive --gres=gpu:2 -t 06:00:00 bash
,nvidia-smi topo -m
gives the "correct" NUMA/CPU affinity (as needed by dask-cuda in the linked lines):but when we request more memory, it doesn't work i.e.,
srun --pty -c 10 -p gpu_p --qos gpu_reservation --nice=0 --mem 200G --gres=gpu:2 --reservation=test_supergpu05 -t 06:00:00 bash
followed bynvidia-smi topo -m
As you see, the NUMA affinity is N/A and the CPU affinity is not correct either.
So the following:
loses one (or both) of the workers because the CPU affinity is wrong for the higher memory configuration, with the following error given:
The section of the dask-cuda codebase that relies on this configuration can also be extracted into a self-contained script as well relying on
pynvml
. Although,pynvml
just relies onnvidia-smi
under the hood which is why I posted that output first. In any case, the following script:will give incorrect results on the higher-memory allocation, where both
print
statements result in empty lists, whereas on the firstsrun
I wrote, the output matches that ofnvidia-smi topo -m
for the cpu affinity.Thanks for any advice!
The text was updated successfully, but these errors were encountered: