-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redact package normalization #42
Conversation
Normalization is not followed in practice. Package names as they appear on pypi.org are the only knowable source of truth.
Simple example is https://pypi.org/project/PyYAML/ |
Some rationale for this (copied from github/advisory-database#52 (comment)) This is to make these package names more consistent and easier to consume and index on. The same package in Python can be specified in an infinite number of ways. e.g. Package URLs also made the same decision: https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#pypi |
Sure! But I think there's a distinction here between an import name (which the OSV spec does not intend to encode) and a package name. I'm not sure normalization implies any relationship between the import name and the package name. It's intended to make I'll let @di chime in as well since he's the Python expert. |
|
Given that the un-normalized display name might change from one release to the next, I think it makes sense to continue using the normalized name here: it means that OSV can consistently refer to the same project without having to track the un-normalized name. It also means consumers don't have to do normalization of every project name themselves when using the database. Note that PyPI respects normalization everywhere, e.g. https://pypi.org/project/flask-caching will always resolve to the same project no matter what the un-normalized name is, and uses the normalized name internally when referring to the project. The un-normalized name is only used for display/vanity purposes. |
Thanks @di for the input! If the un-normalized display name is unstable and can change any time, then that's a pretty strong argument against this. To add to this..
So we need a consistent way to refer to the package, whether it's Using the display name (i.e. |
Packages are inherently dependent on their package registries. If |
I mean, the same is true for the normalized name as well then, no? I'm not seeing the downside to using the normalized name here, only upsides, but maybe I'm missing something. |
Agreed I think that the normal form should also be acceptable.
I see it as an unnecessary restriction on the field. Edit: |
Thanks for opening this issue! We're going to close this for now, given the reasons that we enumerated in this thread. It does create slightly extra work (applying a regex replace), but it comes with significant upsides around database consistency. |
Normalization is not followed in practice.
Package names as they appear on pypi.org are the only knowable source of truth.