[FEA] Make selector choose appropriate CUDA 12.x versions based on dependencies #471

vyasr · 2024-02-05T18:16:50Z

Is your feature request related to a problem? Please describe.
Once RAPIDS adds support for CUDA 12.2, it will be possible to install conda packages of PyTorch along with RAPIDS from conda. Currently this is not possible because PyTorch supports 12.1 and will likely bump straight to 12.3 for their next set of packages. Since the CUDA 12 lineup of RAPIDS packages is going to leverage CEC to support arbitrary CUDA minor versions, we will no longer need users to have a specific one for RAPIDS, but dependencies like PyTorch will likely continue to do so.

Describe the solution you'd like
We should update the release selector to include a range of CUDA minor versions and have it automatically select supported ones based on the user's choice of packages to include in their environment.

Additional context
For libraries like PyTorch, we will also need to consider what channel the package will be installed from. Officially supported PyTorch builds come from the pytorch channel, not conda-forge, so unless/until that changes we will need to ensure that our install command accounts for that correctly.

The text was updated successfully, but these errors were encountered:

jakirkham · 2024-02-05T18:36:30Z

Possibly related ( #470 )

bdice · 2024-02-05T20:39:48Z

#470 fixes the compatible major versions of CUDA for the TensorFlow GPU conda-forge package. It does not impact minor version compatibility.

What part of this is dependent on RAPIDS supporting CUDA 12.2?

I was able to solve this environment, and got a CUDA 12 build of pytorch from conda-forge (pytorch 2.1.2 cuda120_py310h327d3bc_301).

mamba create -n rapids-23.12 -c rapidsai -c conda-forge -c nvidia rapids=23.12 python=3.10 cuda-version=12.0 pytorch

I don't think we can offer official compatibility between RAPIDS / conda-forge and the pytorch channel, given that the pytorch package from the pytorch channel is built against nvidia channel CUDA packages. These channel conflicts are unavoidable. An example environment showing the mixture of nvidia and conda-forge packages can be generated by adding -c pytorch before -c conda-forge:

# Uses both nvidia and conda-forge CUDA Toolkit packages. Not supported.
mamba create -n rapids-23.12 -c rapidsai -c pytorch -c conda-forge -c nvidia rapids=23.12 python=3.10 cuda-version=12.0 pytorch

Last I tested it, this environment worked but we can't offer support for a configuration with CUDA from a mixed set of channels.

At some point in the future we are hoping to make the CUDA distributions on the nvidia and conda-forge channels compatible, but until that point, I don't see any action item here. The install selector works as desired with PyTorch CUDA 12 packages from conda-forge.

vyasr · 2024-02-08T23:58:41Z

I agree that this isn't addressable until the nvidia and conda-forge CTK packages are aligned. We should consider how the selector ought to work once that day comes, though. To @MatthiasKohl's point, though, the pytorch channel is the officially supported medium (by both NVIDIA and PyTorch) for installing the package, so IMHO once the two are aligned we would probably want to encourage installation of PyTorch from the pytorch channel unless and until we see a similar level of support for the conda-forge package as NVIDIA is now providing for the CTK on cf.

MatthiasKohl · 2024-02-09T17:30:15Z

The install selector works as desired with PyTorch CUDA 12 packages from conda-forge.

It might work as desired, but I don't think it should.
I checked today with Cliff and Piotr from DLFW, and both our DLFW teams and upstream pytorch have found many incompatibility issues with the pytorch build from conda-forge, e.g. libc version and so on. The problem is that few people install only pytorch and rely on many other packages, which are all either pip-wheel based or based on conda's main channel, and use different base packages.
IMO, we should not encourage people to use this pytorch build. If RAPIDS cannot be compatible with upstream pytorch (from officially supported channels), then we should either work with DLFW to become compatible, or remove that option from the install selector.

vyasr · 2024-10-22T16:26:01Z

Big relevant news here: pytorch/pytorch#138506

MatthiasKohl · 2024-10-25T12:15:32Z

There has not been any substantial effort / progress to become compatible with DLFWs since this was last discussed.
The fact that PyTorch is deprecating their conda channel means that there will not be any officially supported package of PyTorch on conda, just like for Tensorflow.
Thus, we should remove both the PyTorch and Tensorflow options from the install selector.

agm-eratosth · 2024-10-31T18:32:51Z

There has not been any substantial effort / progress to become compatible with DLFWs since this was last discussed. The fact that PyTorch is deprecating their conda channel means that there will not be any officially supported package of PyTorch on conda, just like for Tensorflow. Thus, we should remove both the PyTorch and Tensorflow options from the install selector.

Rapidsai is often used in conjunction with PyTorch and Tensorflow for many users. Wouldn't it instead make sense to support the conda-forge feedstocks, since they are community driven and pull requests can be made on them? The changes being discussed here can be made for compatibility moving forward with rapids now that the conda-forge channel is the way pytorch will be distributed moving forward on conda.

MatthiasKohl · 2024-11-04T11:37:16Z

Rapidsai is often used in conjunction with PyTorch and Tensorflow for many users. Wouldn't it instead make sense to support the conda-forge feedstocks, since they are community driven and pull requests can be made on them? The changes being discussed here can be made for compatibility moving forward with rapids now that the conda-forge channel is the way pytorch will be distributed moving forward on conda.

This does make sense, but it definitely requires support from Cliff Woolley and org, so I'd recommend reaching out to them and see what they can support. This will likely take a long time, especially if we can support conda-forge officially, so while this effort is going on, I'd still recommend removing the selector.

vyasr added ? - Needs Triage Need team to review and classify feature request New feature or request labels Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Make selector choose appropriate CUDA 12.x versions based on dependencies #471

[FEA] Make selector choose appropriate CUDA 12.x versions based on dependencies #471

vyasr commented Feb 5, 2024 •

edited

Loading

jakirkham commented Feb 5, 2024

bdice commented Feb 5, 2024 •

edited

Loading

vyasr commented Feb 8, 2024

MatthiasKohl commented Feb 9, 2024

vyasr commented Oct 22, 2024

MatthiasKohl commented Oct 25, 2024

agm-eratosth commented Oct 31, 2024 •

edited

Loading

MatthiasKohl commented Nov 4, 2024

[FEA] Make selector choose appropriate CUDA 12.x versions based on dependencies #471

[FEA] Make selector choose appropriate CUDA 12.x versions based on dependencies #471

Comments

vyasr commented Feb 5, 2024 • edited Loading

jakirkham commented Feb 5, 2024

bdice commented Feb 5, 2024 • edited Loading

vyasr commented Feb 8, 2024

MatthiasKohl commented Feb 9, 2024

vyasr commented Oct 22, 2024

MatthiasKohl commented Oct 25, 2024

agm-eratosth commented Oct 31, 2024 • edited Loading

MatthiasKohl commented Nov 4, 2024

vyasr commented Feb 5, 2024 •

edited

Loading

bdice commented Feb 5, 2024 •

edited

Loading

agm-eratosth commented Oct 31, 2024 •

edited

Loading