Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move requirements of local packages to the global requirements #17625

Closed
wants to merge 1 commit into from

Conversation

sampaccoud
Copy link
Contributor

What

The purpose of this PR is to move all Python dependencies listed in submodules to the global requirements.

Why

Initially, the idea for this change was an attempt to speed-up installation in a Docker dev environment.
When building the Docker image, the installation of dependencies is done early in the Dockerfile so that the layer is cached and doesn't need to be reinstalled each time we change a line in the code base. Thanks to this commit, we can build images for our dev and staging environments in a pair of seconds versus minutes before.

This is not vital nor very important but we thought we should submit this PR because we also feel that it might be better for consistency to keep all requirements in one place. For example, you will see in the commit that, out of all the requirements expressed in the submodules setup.py files, only 2 were not already covered in global requirements.

Moving all dependencies to the global requirements ensuires
consistency and simplifies installation. In particular, it speeds
up Docker image build for development as all requirements can be
installed in an early layer of the Dockerfile which is not invalidated
each time we change code anywhere in the project.
@openedx-webhooks
Copy link

Thanks for the pull request, @sampaccoud! I've created OSPR-2277 to keep track of it in JIRA. JIRA is a place for product owners to prioritize feature reviews by the engineering development teams.

Feel free to add as much of the following information to the ticket:

  • supporting documentation
  • edx-code email threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will still be done via the GitHub pull request interface. As a reminder, our process documentation is here.

We can't start reviewing your pull request until you've submitted a signed contributor agreement or indicated your institutional affiliation. If you like, you can add yourself to the AUTHORS file for this repo, though that isn't required. Please see the CONTRIBUTING file for more information.

@openedx-webhooks openedx-webhooks added needs triage open-source-contribution PR author is not from Axim or 2U labels Mar 6, 2018
@nedbat
Copy link
Contributor

nedbat commented Mar 6, 2018

Rather than consolidate the requirements, we would rather move each of these projects into its own repo. It doesn't make sense to have these 8 separately installable packages in one repo.

@nedbat
Copy link
Contributor

nedbat commented Mar 6, 2018

jenkins ok to test

@sampaccoud
Copy link
Contributor Author

@nedbat if you confirm the desire to do it and there is a way I can help you set-up these external repository, I would happily spend time on it. It would be a good first contribution to edx-platform!

@nedbat
Copy link
Contributor

nedbat commented Mar 6, 2018

Thanks for the offer. I think we would have to make the new repos though before anything could happen?

@nedbat
Copy link
Contributor

nedbat commented Mar 6, 2018

Also, xmodule would be difficult to remove because of the intertwined nature of the code and tests. The others might be simple.

@edx-status-bot
Copy link

Your PR has finished running tests.

@smarnach
Copy link
Contributor

smarnach commented Mar 6, 2018

@nedbat I agree that it sounds reasonable to split out some of the libraries, but there are also downsides and complexities to that appraoch.

The first problem is dependency management. We would need to keep the versions of dependencies of the split out libraries in sync with the requirments of edx-platform. Currently this is already required for the requirements listed in the multiple setup.py files, but at least it can be done atomically in a single commit. When splitting the libraries out, and upgrade for, say, Numpy, would require multiple coordinated PRs to multiple libraries, and version upgrades in the requirements of edx-platform. This problem isn't new – we already have it for some of the external libraries, but it may get worse.

Another problem, as you have already mentioned, is that some of the libraries can't be easily split out. I think xmodule would be cumbersome to split out, and I wonder whether it would actually be helpful to do this. Another library that would require some work to split out is xblock_discussion, as explained in this README file.

A third problem is that some of the libs are so small that it really doesn't seems worth it to split them out, e.g. there is just a few lines of code in dogstats and safe_lxml.

Overall I think it requires some thought and planning to decide which of the libs should be split out, and in the meantime this PR may simplify the requirements. Note that the requirements in the current setup.py files aren't complete anyway – some of the libs have dependencies that are installed by edx-platform without declaring them explicitly, which works fine, since nobody ever uses these libs independently of edx-platform.

@nedbat
Copy link
Contributor

nedbat commented Mar 23, 2018

@smarnach thanks for the nuanced thoughts about the library separation. You are right that some would be hard to split, and some might not be worth it. But this PR only loses information. I don't see the value in that. If a library is split, it's setup.py would go with it, and would need dependencies. If a library is kept, we should change it so that it has no setup.py. That would be the time to combine the requirements.

I would prefer to close this PR.

@smarnach
Copy link
Contributor

@nedbat I think it's a rather weird and unexpected pattern to have multiple setup.py files in a single repository, and install them in "develop" mode for production using -e <local_path> in the requirements. For the libraries that aren't split out, it would be nice to simply consider them part of the same package and possibly move them to openedx/core/lib (or whatever the right place for them may be). This PR tries to mitigate one of the downsides of this pattern.

I see these advantages of the current PR:

  • All dependencies of the code in this repo are declared in a central place. When updating dependency versions, they need to be updated in a single place instead of in multiple places. (E.g. currently the version of NumPy is mentioned in three files, and needs to be updated in these three files.)
  • It helps speeding up the Docker build by allowing more efficient use of Docker layers.

I don't think the information we lose with this PR is particularly valuable, since at least some of the depndency declarations in the setup.py files are incomplete, so when splitting out some of the libraries we will have to verify the exact requirements at that time anyway. Already having a partial list doesn't make that step a lot easier.

I can see that this PR is a rather partial step. The proper solution would be to split out the libraries we want to split out, and properly integrate the ones we want to keep.

@nedbat
Copy link
Contributor

nedbat commented Apr 5, 2018

@jmbowman As the main pip-meister here, do you have an opinion?

@jmbowman
Copy link
Contributor

jmbowman commented Apr 5, 2018

We have a Python dependency management overhaul about to merge in edx-platform. For full details on the rationale behind it and how we'd like to handle this moving forward, see the draft OEP (rendered version); we'd actively like to get feedback on that so we can make sure this works for the entire community.

Specifically regarding the separate installable packages within edx-platform: the change to use pip-tools for all the requirements files, as currently implemented, will keep the versions of their dependencies in sync (and everything is nicely version-pinned in a single requirements file for a given context, making installation even faster than the current multiple-requirements-files setup). I would like to soon remove the dependency version specifications from the setup.py files so it's easier to upgrade those moving forwards, but keep track of which packages are actually depended on. Then we can split out or better integrate those packages as time allows.

After merging the pip-tools support PR and removing version pins from the internal setup.py files, are there any other changes from this PR that you think are important to get merged in the near future?

@smarnach
Copy link
Contributor

smarnach commented Apr 5, 2018

@jmbowman Thanks for this information – while I had heard about the move to pip-tools, I wasn't aware of the details you mentioned, and didn't consider it in the context of this PR. If you are going to remove the requirements from the setup.py files soon anyway, we can close this PR, since the problem it is trying to solve will go away.

@nedbat
Copy link
Contributor

nedbat commented Apr 10, 2018

@sampaccoud Can you close the PR, or do you still have concerns about that path?

@sampaccoud
Copy link
Contributor Author

No more. The Python dependency management overhaul is great!

@sampaccoud sampaccoud closed this Apr 11, 2018
@rmoch rmoch deleted the fun-local-requirements branch August 1, 2018 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U rejected
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants