Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip pyNVML to support Tegra devices #402

Open
wants to merge 3 commits into
base: branch-0.17
Choose a base branch
from

Conversation

pentschev
Copy link
Member

Fixes #400

@pentschev pentschev requested a review from a team as a code owner September 25, 2020 21:05
@pentschev
Copy link
Member Author

cc @JasonAtNvidia

@codecov-commenter
Copy link

codecov-commenter commented Sep 25, 2020

Codecov Report

Merging #402 into branch-0.16 will decrease coverage by 2.65%.
The diff coverage is 69.23%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.16     #402      +/-   ##
===============================================
- Coverage        59.65%   56.99%   -2.66%     
===============================================
  Files               17       19       +2     
  Lines             1331     1451     +120     
===============================================
+ Hits               794      827      +33     
- Misses             537      624      +87     
Impacted Files Coverage Δ
dask_cuda/utils.py 85.31% <69.23%> (-2.00%) ⬇️
dask_cuda/explicit_comms/dataframe_merge.py 90.38% <0.00%> (-0.45%) ⬇️
dask_cuda/explicit_comms/__init__.py 100.00% <0.00%> (ø)
dask_cuda/benchmarks/local_cupy_map_overlap.py 0.00% <0.00%> (ø)
dask_cuda/explicit_comms/dataframe_shuffle.py 95.52% <0.00%> (ø)
dask_cuda/device_host_file.py 98.64% <0.00%> (+0.03%) ⬆️
dask_cuda/cli/dask_cuda_worker.py 96.77% <0.00%> (+0.05%) ⬆️
dask_cuda/initialize.py 92.59% <0.00%> (+0.28%) ⬆️
dask_cuda/cuda_worker.py 72.28% <0.00%> (+0.86%) ⬆️
dask_cuda/local_cuda_cluster.py 82.95% <0.00%> (+0.93%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 71fad3a...0c9f29a. Read the comment docs.

@jakirkham
Copy link
Member

Would this make sense to do generally or have we found issues with this approach in some cases? Asking as we now know of 2 cases where NVML is not an option Tegra and WSL.

@pentschev
Copy link
Member Author

Would this make sense to do generally or have we found issues with this approach in some cases? Asking as we now know of 2 cases where NVML is not an option Tegra and WSL.

Using Numba to do that implies creating a context, and that can be tricky in certain situations where we need the context creation to be delayed. We should evaluate context creation on a case-by-case basis as this has proven problematic in several instances in the past.

TBH, I don't know if things are 100% correct for Tegra, but it's very difficult to evaluate much further without being able to test. Long story short: I hope this PR will provide some support to Tegra but we can't make any guarantees without extensive testing, same applies for WSL or any other platforms we may want to support in the future.

@jakirkham
Copy link
Member

Using Numba to do that implies creating a context, and that can be tricky in certain situations where we need the context creation to be delayed. We should evaluate context creation on a case-by-case basis as this has proven problematic in several instances in the past.

Won't it already be tricky to initialize the context? AIUI we wanted this for multi-GPU devices.

TBH, I don't know if things are 100% correct for Tegra, but it's very difficult to evaluate much further without being able to test. Long story short: I hope this PR will provide some support to Tegra but we can't make any guarantees without extensive testing, same applies for WSL or any other platforms we may want to support in the future.

Fair enough I think we have to rely on folks being pretty engaged to keep things working. This is also why I'm wondering if we can standardize on a common code path that we are able to test in other cases at least.

@pentschev
Copy link
Member Author

On a discussion with @JasonAtNvidia and @jakirkham , we decided to move this to 0.17. One of the limitations we found is getting GPU information, such as memory, without NVML. That requires Numba, and since we replace its memory manager with RMM's and RMM doesn't implement such functionality, we get errors:

============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.1.1, py-1.9.0, pluggy-0.13.1 -- /mnt/data/miniforge3/envs/dasktest/bin/python
cachedir: .pytest_cache
rootdir: /home/nvidia/Documents/dask-cuda
collecting ... collected 998 items / 1 error / 1 skipped / 996 selected
==================================== ERRORS ====================================
________________ ERROR collecting dask_cuda/tests/test_spill.py ________________
dask_cuda/tests/test_spill.py:18: in <module>
    if utils.get_device_total_memory() < 1e10:
/mnt/data/miniforge3/envs/dasktest/lib/python3.7/site-packages/dask_cuda-0.16.0a0+95.g04dcbb6.dirty-py3.7.egg/dask_cuda/utils.py:175: in get_device_total_memory
    return numba.cuda.current_context().get_memory_info()[1]
/mnt/data/miniforge3/envs/dasktest/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py:1040: in get_memory_info
    return self.memory_manager.get_memory_info()
/mnt/data/miniforge3/envs/dasktest/lib/python3.7/site-packages/rmm/rmm.py:214: in get_memory_info
    raise NotImplementedError()
E   NotImplementedError
=============================== warnings summary ===============================
../../../../mnt/data/miniforge3/envs/dasktest/lib/python3.7/site-packages/pandas/util/__init__.py:12
  /mnt/data/miniforge3/envs/dasktest/lib/python3.7/site-packages/pandas/util/__init__.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
    import pandas.util.testing
dask_cuda/tests/test_local_cuda_cluster.py:76
  /home/nvidia/Documents/dask-cuda/dask_cuda/tests/test_local_cuda_cluster.py:76: PytestUnknownMarkWarning: Unknown pytest.mark.asyncio - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/mark.html
    @pytest.mark.asyncio
dask_cuda/tests/test_local_cuda_cluster.py:89
  /home/nvidia/Documents/dask-cuda/dask_cuda/tests/test_local_cuda_cluster.py:89: PytestUnknownMarkWarning: Unknown pytest.mark.asyncio - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/mark.html
    @pytest.mark.asyncio
-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
ERROR dask_cuda/tests/test_spill.py - NotImplementedError
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
=================== 1 skipped, 3 warnings, 1 error in 4.28s ====================

@pentschev pentschev changed the base branch from branch-0.16 to branch-0.17 October 7, 2020 19:06
@pentschev
Copy link
Member Author

Won't it already be tricky to initialize the context? AIUI we wanted this for multi-GPU devices.

Even single-GPU devices can benefit from these changes for Tegra clusters.

Fair enough I think we have to rely on folks being pretty engaged to keep things working. This is also why I'm wondering if we can standardize on a common code path that we are able to test in other cases at least.

It seems we'll soon begin testing more with Tegra, so that will help us here. To be frank, I don't want to move away from NVML only because of Tegra, NVML is much better in various ways (e.g., no need to create a CUDA context) and supports things like gathering CPU affinity for each GPU that other tools don't, so I'd rather have two code paths.

@pentschev
Copy link
Member Author

It seems we'll have some Tegra devices in CI soon, we can then push this PR forward.

@pentschev pentschev added 2 - In Progress Currently a work in progress feature request New feature or request non-breaking Non-breaking change 0 - Blocked Cannot progress due to external reasons and removed 2 - In Progress Currently a work in progress labels Jan 8, 2021
@github-actions
Copy link

This PR has been marked stale due to no recent activity in the past 30d. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be marked rotten if there is no activity in the next 60d.

@quasiben
Copy link
Member

I believe we are still waiting on Tegra devices

@pentschev
Copy link
Member Author

@mike-wendt @raydouglass based on the discussion from Ops Demo Tuesday, this is a PR we would like to test with Tegra devices when we have it available in CI.

@mike-wendt
Copy link
Contributor

@mike-wendt @raydouglass based on the discussion from Ops Demo Tuesday, this is a PR we would like to test with Tegra devices when we have it available in CI.

@pentschev @quasiben as an update @Ethyling and I are working on getting the L4T images integrated on Monday. Once we get that done we'll have a platform to easily test these builds for CUDA 10.2 as that's the only current CUDA version supported on L4T/AGX.

@pentschev
Copy link
Member Author

Sounds great, thanks @mike-wendt and let us know if/how we can assist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Blocked Cannot progress due to external reasons feature request New feature or request non-breaking Non-breaking change
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

pyNVML won't work on a Jetson, is there a workaround
5 participants