-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA]: Introduce Python module with CCCL headers #3201
base: main
Are you sure you want to change the base?
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
python/cuda_cccl/setup.py
Outdated
project_path = os.path.abspath(os.path.dirname(__file__)) | ||
cccl_path = os.path.abspath(os.path.join(project_path, "..", "..")) | ||
cccl_headers = [["cub", "cub"], ["libcudacxx", "include"], ["thrust", "thrust"]] | ||
ver = "0.1.2.8.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to use the CCCL version here, not CCCL Python modules' version. We should also not hard-code it, but instead read from CMakeLists which is the source of truth AFAIK, and for that setuptools might not be doing the job. @vyasr might have a simple example for how this can be done with scikit-build-core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. I added this is a bullet to the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out the dynamic metadata section, specifically the Regex tab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would need to rewrite everything here to use CMake instead of setuptools. Depending on what this module is trying to do that may or may not be beneficial. Do you need to run compilation of cuda_cccl/cooperative/parallel against CCCL headers? In that case it is almost certainly worthwhile, I wouldn't want to orchestrate that compilation using setuptools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to run compilation of cuda_cccl/cooperative/parallel against CCCL headers?
cuda_cccl
would just be nvidia-cuda-cccl-cuXX containing the headers but owned/maintained by the CCCL team for faster release cycles (think of it ascccl
vscuda-cccl
on conda-forge)cuda_cooperative
JIT compiles CCCL headers at run time, so for all purposes the headers can be thought as shared libraries; no AOT compilation is neededcuda_parallel
is the most interesting case, because it does need to build the CCCL C shared library and include it in the wheel, but I dunno if building it requires NVCC + CCCL headers, or GCC/MSVC alone is enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I dunno if building it requires NVCC + CCCL headers, or GCC/MSVC alone is enough
Based on
- adding
-DCMAKE_VERBOSE_MAKEFILE=ON
and looking at the output of pip install --verbose ./cuda_parallel[test]
nvcc
is required for compiling cccl/c/parallel/src/for.cu
and reduce.cu
:
cd /home/coder/cccl/python/cuda_parallel/build/temp.linux-x86_64-cpython-312/c/parallel && /usr/bin/sccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/g++ -DCCCL_C_EXPERIMENTAL=1 -DNVRTC_GET_TYPE_NAME=1 -D_CCCL_NO_SYSTEM_HEADER -Dcccl_c_parallel_EXPORTS --options-file CMakeFiles/cccl.c.parallel.dir/includes_CUDA.rsp -O3 -DNDEBUG -std=c++20 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" -Xcompiler=-fPIC -Xcudafe=--display_error_number -Wno-deprecated-gpu-targets -Xcudafe=--promote_warnings -Wreorder -Xcompiler=-Werror -Xcompiler=-Wall -Xcompiler=-Wextra -Xcompiler=-Wreorder -Xcompiler=-Winit-self -Xcompiler=-Woverloaded-virtual -Xcompiler=-Wcast-qual -Xcompiler=-Wpointer-arith -Xcompiler=-Wvla -Xcompiler=-Wno-gnu-line-marker -Xcompiler=-Wno-gnu-zero-variadic-macro-arguments -Xcompiler=-Wno-unused-function -Xcompiler=-Wno-noexcept-type -MD -MT c/parallel/CMakeFiles/cccl.c.parallel.dir/src/for.cu.o -MF CMakeFiles/cccl.c.parallel.dir/src/for.cu.o.d -x cu -c /home/coder/cccl/c/parallel/src/for.cu -o CMakeFiles/cccl.c.parallel.dir/src/for.cu.o
cd /home/coder/cccl/python/cuda_parallel/build/temp.linux-x86_64-cpython-312/c/parallel && /usr/bin/sccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/g++ -DCCCL_C_EXPERIMENTAL=1 -DNVRTC_GET_TYPE_NAME=1 -D_CCCL_NO_SYSTEM_HEADER -Dcccl_c_parallel_EXPORTS --options-file CMakeFiles/cccl.c.parallel.dir/includes_CUDA.rsp -O3 -DNDEBUG -std=c++20 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" -Xcompiler=-fPIC -Xcudafe=--display_error_number -Wno-deprecated-gpu-targets -Xcudafe=--promote_warnings -Wreorder -Xcompiler=-Werror -Xcompiler=-Wall -Xcompiler=-Wextra -Xcompiler=-Wreorder -Xcompiler=-Winit-self -Xcompiler=-Woverloaded-virtual -Xcompiler=-Wcast-qual -Xcompiler=-Wpointer-arith -Xcompiler=-Wvla -Xcompiler=-Wno-gnu-line-marker -Xcompiler=-Wno-gnu-zero-variadic-macro-arguments -Xcompiler=-Wno-unused-function -Xcompiler=-Wno-noexcept-type -MD -MT c/parallel/CMakeFiles/cccl.c.parallel.dir/src/reduce.cu.o -MF CMakeFiles/cccl.c.parallel.dir/src/reduce.cu.o.d -x cu -c /home/coder/cccl/c/parallel/src/reduce.cu -o CMakeFiles/cccl.c.parallel.dir/src/reduce.cu.o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed over the code and I am actually confused, because my impression is that the kernel compilation is still done at run time (JIT), and that the host logic can just be handled by a host compiler. @gevtushenko IIRC you built the prototype, any reason we have to use .cu
files here and use NVCC to compile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 2913ae0 adopts the established _version.py handling.
Q: In what way is it not working? |
It is getting a non-existing path here:
At HEAD, cuda_paralleld/cuda/_include exists in the source directory (it is |
On August 30, 2014 @leofang wrote: Leo: Do you still recommend that we replace I'm asking because that'll take this PR in a very different direction (I think). |
Logging an observation (JIC it's useful to reference this later): With CCCL HEAD (I have @ d6253b5) TL;DR: @gevtushenko could this explain your "only works 50% of the time" experience? Current working directory is
The output is:
Similarly for cuda_parallel:
Same output as above. |
Now with this PR (@ daab580) TL;DR: Same problem (this had me really confused TBH).
Output:
|
Small summary:
|
Commit ef9d5f4 makes the I wouldn't be surprised if this isn't the right way of doing it, but it does work in one pass. |
… cuda._include to find the include path.
Commit bc116dc fixes the |
… (they are equivalent to the default functions)
It turns out what I discovered the hard way was actually a known issue: Lines 23 to 27 in d6253b5
|
/ok to test |
🟩 CI finished in 58m 34s: Pass: 100%/176 | Total: 1d 00h | Avg: 8m 22s | Max: 44m 12s | Hits: 99%/22510
|
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | python |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 176)
# | Runner |
---|---|
125 | linux-amd64-cpu16 |
25 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
This is ready for review now. It's by no means fancy or ambitious, but I believe it's a meaningful conservative step in the right direction. |
🟩 CI finished in 1h 49m: Pass: 100%/176 | Total: 1d 00h | Avg: 8m 24s | Max: 50m 56s | Hits: 99%/22510
|
Project | |
---|---|
+/- | CCCL Infrastructure |
libcu++ | |
CUB | |
Thrust | |
CUDA Experimental | |
+/- | python |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 176)
# | Runner |
---|---|
125 | linux-amd64-cpu16 |
25 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
Description
closes #2281