Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild for cuda for ppc64le and aarch64 #62

Conversation

regro-cf-autotick-bot
Copy link
Contributor

This PR has been triggered in an effort to update cuda_112_ppc64le_aarch64.

Notes and instructions for merging this PR:

  1. Please merge the PR only after the tests have passed.
  2. Feel free to push to the bot's branch to update this PR if needed.

Please note that if you close this PR we presume that the feedstock has been rebuilt, so if you are going to perform the rebuild yourself don't close this PR until the your rebuild has been merged.

If this PR was opened in error or needs to be updated please add the bot-rerun label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase @conda-forge-admin, please rerun bot in a PR comment to have the conda-forge-admin add it for you.

This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. Feel free to drop us a line if there are any issues! This PR was generated by https://github.com/regro/autotick-bot/actions/runs/3857618680, please use this URL for debugging.

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@jakirkham
Copy link
Member

Please leave this open. We are investigating an associated migrator issue upstream ( regro/cf-scripts#1586 )

@jakirkham
Copy link
Member

Keeping for the migration per this comment ( regro/cf-scripts#1586 (comment) )

@jakirkham jakirkham mentioned this pull request Jan 6, 2023
@jakirkham
Copy link
Member

Think the arch builds will complete within the Travis CI time limits. In fact CI already indicates this is an issue. Given this there are a few potential options:

  1. Switch to emulated builds
  2. Try to optimize the builds (to shorten their runtime)
  3. Cross-compile
  4. Manually build and upload

With 1, emulated builds are a lot slower than native builds. They run on Azure so have a longer time limit. However given the slower nature, they may still not complete in time. We could try it though.

With 2, there are probably some optimizations that could help. However we already skip the AVX2 builds when not on x86_64. So we are already avoiding one of the more expensive steps. Had attempted using Ninja previously, but the build times didn't change much and it seems the build config has bugs that were encountered there. In any event, given the x86_64 CUDA builds on native architecture already take a few hours, expect we won't be able to optimize enough to solve this situation with this alone.

With 3, we don't currently have the infrastructure to handle cross-compiling with CUDA. There is some initial work on this and we may be able to use this in the future here, but probably not in the near term.

In the near term, 4, manually building and uploading per CFEP 3 seems like the most practical thing in the near term.

Thoughts?

@jakirkham
Copy link
Member

cc @bdice @Ethyling

@bdice
Copy link

bdice commented Jan 7, 2023

For the purposes of RAPIDS and raft in particular (which depends on faiss), we are working to remove the faiss dependency, so we are not likely to need ARM + CUDA builds anymore a year from now. Building a single version 1.7.2 might be sufficient. With that in mind, I would be fine with manually building if that is easier than battling to make CI work (which will continue to be an ongoing struggle for later versions, I suspect).

I don't know how to enable emulated builds in conda-forge (option 1), but if we did need a CI solution, that sounds the most plausible.

@jakirkham
Copy link
Member

Based on other discussion, it sounds like we may want to update this logic as well

# the following are all the x86-relevant gpu arches; for building aarch64-packages, add: 53, 62, 72
ARCHES=(52 60 61 70)
if [ $(version2int $cuda_compiler_version) -ge $(version2int "11.1") ]; then
# Ampere support for GeForce 30 (sm_86) needs cuda >= 11.1
LATEST_ARCH=86
# ARCHES does not contain LATEST_ARCH; see usage below
ARCHES=( "${ARCHES[@]}" 75 80 )
elif [ $(version2int $cuda_compiler_version) -ge $(version2int "11.0") ]; then
# Ampere support for A100 (sm_80) needs cuda >= 11.0
LATEST_ARCH=80
ARCHES=( "${ARCHES[@]}" 75 )
fi

@bdice
Copy link

bdice commented Jan 7, 2023

Based on other discussion, it sounds like we may want to update this logic as well

Good call. We'd want to enable arch 90 for CUDA >=11.8 on both x86_64 and aarch64. See rapids-cmake for reference.

@jakirkham
Copy link
Member

It is worth pointing out that conda-forge is handling CUDA compatibility differently than RAPIDS. For the most part packages here are built against CUDA 11.2 and then allowed to run on CUDA 11.2+.

This differs from RAPIDS where the latest CUDA is always built against and then packages can install with earlier CUDA versions.

The conda-forge approach may change in the future to use cuda-compat ( conda-forge/staged-recipes#21382 (comment) ). Though that will take some time to implement.

@h-vetinari
Copy link
Member

h-vetinari commented Jan 7, 2023

Cross-compiling CUDA would be the best IMO, but this has been stuck for a couple months.

conda-forge/conda-forge-ci-setup-feedstock#210

@h-vetinari
Copy link
Member

OK, the windows failures are not from the CMake version. I checked that 3.25 was used in the last successful run.

CMake Warning at D:/bld/faiss-split_1684916312750/_build_env/Library/share/cmake-3.25/Modules/CMakeDetermineCUDACompiler.cmake:15 (message):
  Visual Studio does not support specifying CUDAHOSTCXX or
  CMAKE_CUDA_HOST_COMPILER.  Using the C++ compiler provided by Visual
  Studio.
Call Stack (most recent call first):
  CMakeLists.txt:28 (enable_language)


CMake Error at D:/bld/faiss-split_1684916312750/_build_env/Library/share/cmake-3.25/Modules/CMakeDetermineCompilerId.cmake:491 (message):
  No CUDA toolset found.

@jakirkham @adibbley @bdice
Did anything regarding the windows setup for CUDA 11 change in the last ~4 months? If so (or even if not), would you know how to fix the above?

@jakirkham
Copy link
Member

Maybe we need to add CUDA 11.8 to conda-forge-ci-setup ( similar to how CUDA 11.7 was handled: conda-forge/conda-forge-ci-setup-feedstock#199 )?

@h-vetinari h-vetinari force-pushed the rebuild-cuda_112_ppc64le_aarch64-0-1_h15cace branch from 6f6d6fa to f7385ea Compare May 28, 2023 07:00
@h-vetinari h-vetinari mentioned this pull request May 31, 2023
@jakirkham
Copy link
Member

Have seen some odd behavior with vs2019 and CUDA ( conda-forge/cupy-feedstock#199 (comment) ). Switching to vs2017 has helped. Not sure if that would be the issue here

@jakirkham
Copy link
Member

@conda-forge-admin, please re-render

@jakirkham
Copy link
Member

It appears the Windows builds are now passing! 🎉

Looks like there is a different error on CI


CMake Error at /home/conda/feedstock_root/build_artifacts/faiss-split_1692238597149/_build_env/share/cmake-3.26/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find Python (missing: Python_INCLUDE_DIRS Python_LIBRARIES
  Python_NumPy_INCLUDE_DIRS Development NumPy Development.Module
  Development.Embed)

@jakirkham
Copy link
Member

@conda-forge-admin, please re-render

Comment on lines +1 to +2
azure:
free_disk_space: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As one the CI builds (also attached log) ran out of space, try cleaning up the images first

@jakirkham
Copy link
Member

Added some more CMake parameters and that fixed some of the issues. However finding NumPy is still running into issues on CI (also attached log):

CMake Error at /home/conda/feedstock_root/build_artifacts/faiss-split_1692404167043/_build_env/share/cmake-3.26/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find Python (missing: Development NumPy Development.Module
  Development.Embed) (found version "3.11.4")

Not exactly sure what we are still missing

@jakirkham
Copy link
Member

@conda-forge-admin , please re-render

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants