Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split ucx_api.pyx into multiple files #711

Merged
merged 2 commits into from
Apr 23, 2021

Conversation

pentschev
Copy link
Member

This PR split ucx_api.pyx into multiple .pyx files, one for each UCX* class, and some others for utils, typedefs and transfer functions. The benefit of doing this is that we could later upstream each part on its own to prevent really large PRs for UCX folks to review, besides, of course, avoiding a single-file with over two thousand lines that is getting hard to navigate.

@pentschev
Copy link
Member Author

@madsbk @quasiben @jakirkham do you have opinions on doing this? Particularly, I find it useful regardless of the upstreaming effort.

@MattBBaker
Copy link
Contributor

I wanted to +1 this. I am working on a RMA patch and the 2200 line file problem is going to get worse.

@pentschev
Copy link
Member Author

@MattBBaker please hold on with your PR then. 🤣

@MattBBaker
Copy link
Contributor

Nah, I'll just pre split my work. Though on that note, the transfer.pyx file is still pretty big. In UCX upstream things are usually split tags/AM/stream/RMA.

@pentschev
Copy link
Member Author

Though on that note, the transfer.pyx file is still pretty big. In UCX upstream things are usually split tags/AM/stream/RMA.

Yes, I thought the same, this is mostly to get the ball running and get opinions, if there are no objections I'll do that. UCXWorker is also quite big, I may have to break it down as well.

@madsbk
Copy link
Member

madsbk commented Apr 23, 2021

I think this is a very good idea! but I suggest that we make header files (.pyd) for each .pyx file. This will clean up the interface between them and speedup the build process.

UCXWorker is also quite big, I may have to break it down as well.

I am not sure it is a good idea to split up a class.

@pentschev
Copy link
Member Author

I think this is a very good idea! but I suggest that we make header files (.pyd) for each .pyx file. This will clean up the interface between them and speedup the build process.

That would need us to create a Cython Extension for each .pyd leading to a .so file for each. I'm not against doing that but it would then be a breaking change w.r.t. the API, in other words, we would need to from ucp._libs.ucx_worker import UCXWorker instead of from ucp._libs.ucx_api import UCXWorker, for example. To prevent that, we need to include "file.pyx", as done here.

I am not sure it is a good idea to split up a class.

I didn't mean splitting up the class itself, but maybe the functions, such as the AM receive callback ones.

@pentschev
Copy link
Member Author

That would need us to create a Cython Extension for each .pyd leading to a .so file for each. I'm not against doing that but it would then be a breaking change w.r.t. the API, in other words, we would need to from ucp._libs.ucx_worker import UCXWorker instead of from ucp._libs.ucx_api import UCXWorker, for example. To prevent that, we need to include "file.pyx", as done here.

Another way this can be resolved is to create a ucx_api subdirectory, then have all .pyx/.pxd in there an add from .ucx_class import UCXClass in __init__.py. It's only unclear to me if there are any downsides of having multiple .so files, possibly binary size will increase.

@madsbk
Copy link
Member

madsbk commented Apr 23, 2021

That would need us to create a Cython Extension for each .pyd leading to a .so file for each. I'm not against doing that but it would then be a breaking change w.r.t. the API, in other words, we would need to from ucp._libs.ucx_worker import UCXWorker instead of from ucp._libs.ucx_api import UCXWorker, for example. To prevent that, we need to include "file.pyx", as done here.

Fair enough, let's start with your current proposal. But if it is possible to make ucp/_libs/typedefs.pyx a .pyd file, I think that would be good.

@pentschev
Copy link
Member Author

Another way this can be resolved is to create a ucx_api subdirectory, then have all .pyx/.pxd in there an add from .ucx_class import UCXClass in __init__.py. It's only unclear to me if there are any downsides of having multiple .so files, possibly binary size will increase.

For completeness here, if we choose to have the idea above, this is how things look like:

$ ls -l ucx_api_v2/
total 9521
-rw-rw-r-- 1 pentschev pentschev     392 Apr 23 03:01 __init__.py
-rwxrwxr-x 1 pentschev pentschev  935264 Apr 23 03:01 packed_remote_key.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev 1718696 Apr 23 03:01 transfer.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  105944 Apr 23 03:01 typedefs.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  934448 Apr 23 03:01 ucx_address.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  475120 Apr 23 03:01 ucx_context.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  357232 Apr 23 03:01 ucx_endpoint.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  421488 Apr 23 03:01 ucx_listener.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev 1005648 Apr 23 03:01 ucx_memory_handle.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  281640 Apr 23 03:01 ucx_object.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  371056 Apr 23 03:01 ucx_request.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  980816 Apr 23 03:01 ucx_rkey.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev 1831968 Apr 23 03:01 ucx_worker.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  326352 Apr 23 03:01 utils.cpython-38-x86_64-linux-gnu.so
$ du -sh ucx_api_v2/
9.3M    ucx_api_v2/
$ du -sh ucx_api.cpython-38-x86_64-linux-gnu.so
3.6M    ucx_api.cpython-38-x86_64-linux-gnu.so

The case above has ucx_api_v2 with each file being a Cython Extension, and doing so makes binaries almost 3x as large as the single .so file. Not sure if this is a huge problem, but thought of sharing this for further discussion.

@madsbk
Copy link
Member

madsbk commented Apr 23, 2021

@pentschev let's get this in now then when @jakirkham gets back, we can discuss possible .pyx/.pxd solutions.

@madsbk
Copy link
Member

madsbk commented Apr 23, 2021

Sorry pyd is a typo, I mean .pxd.

@pentschev pentschev marked this pull request as ready for review April 23, 2021 10:48
@pentschev pentschev requested a review from a team as a code owner April 23, 2021 10:48
@pentschev
Copy link
Member Author

Fair enough, let's start with your current proposal. But if it is possible to make ucp/_libs/typedefs.pyx a .pyd file, I think that would be good.

Sorry pyd is a typo, I mean .pxd.

I discussed this offline with @madsbk and I think this isn't what we want (for now). Creating a new .pxd file incurs creating a new Extension (which leads to another .so file). For now, this is resolved with include "file.pyx" in ucx_api.pyx, which is just substituting the include "file.pyx" statement with the contents of file.pyx, as if it was a single, large file.

For now we'll merge this as is, and later revisit whether we should create new Extensions for each file or leave it as is.

@pentschev
Copy link
Member Author

Thanks @madsbk , I filed #712 to continue the discussion there.

@pentschev pentschev merged commit 4d9985b into rapidsai:branch-0.20 Apr 23, 2021
@pentschev pentschev deleted the split-ucx_api branch May 25, 2021 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants