[DRAFT] feat: draft of lessons learned on publish security #538

lwasser · 2024-12-14T01:56:10Z

This is the start of a draft post on publishing to pypi best practices. it should be super duper user friendly and easy to read. I will need feedback!!

i will be working on some graphics and such next.

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1208969693341559

lwasser · 2024-12-14T01:59:49Z

_posts/2024-12-13-python-packaging-security.md

+          python-version: '3.9'
+
+      # Step 3: Cache dependencies
+      - name: Cache dependencies


This is not tested and could be wrong.

This is one of the uses but oftentimes, building is performed by pypa/build so it'd be just pip install build and not multiple direct deps as one might expect from a requirements.txt file. So the example seems a bit unrealistic, except for some untypical projects with surprising build automation required.

oh, right, good point. What is a good example of caching dependencies (other than what we say with ultranalytics)? I use caching when i'm publishing.

we pip install hatch and use hatch build

What do you think about removing the example and just telling people that if they do cache dependencies for any reason, don't do it when the build is for publishing?

for instance, doing it for a website build could be ok if it's a preview, etc. vs publishing a package?

_posts/2024-12-13-python-packaging-security.md

lwasser · 2024-12-17T18:54:54Z

hey @webknjaz @hugovk do you have a bit of time to review this post for accuracy? i'm trying to provide a more beginner-friendly overview of what we learned that our community can quickly understand and implement.

We use the the workflow @webknjaz helped us create for the pyosmeta package. And we want to link to all of the pypi / psf posts that are relevant too. many thanks for your time!

webknjaz

Here's some quick notes. I might've missed something.

Let's also ask @woodruffw for feedback.

webknjaz · 2024-12-18T01:04:58Z

_posts/2024-12-13-python-packaging-security.md

+
+## Is your PyPI publication workflow secure?
+
+The recent Python package breach [involving Ultralytics](https://blog.pypi.org/posts/2024-12-11-ultralytics-attack-analysis/) has highlighted the need for secure PyPI publishing workflows. Hackers exploited a GitHub workflow (`pull_request_target`) to inject malicious code, which was published to PyPI. Users who downloaded the package unknowingly allowed their machines to be hijacked for Bitcoin mining.


a GitHub workflow (pull_request_target)

pull_request_target is not a workflow, it's one of the events that may trigger a workflow run. A workflow may “subscribe” to different events like push and pull_request, this is just another event type with elevated privileges in its security context.

unknowingly allowed their machines to be hijacked for Bitcoin mining.

Not only that, but they also did that in Google Colab which got their accounts banned pretty swiftly...

OOOOH no way. i missed the Google Colab part!! that's a pretty bold move for a hacker? maybe they were new hackers ?

webknjaz · 2024-12-18T01:08:29Z

_posts/2024-12-13-python-packaging-security.md

+The Ultralytics breach is a wake-up call for all maintainers: secure your workflows now to protect your users and the Python ecosystem. Start with these key actions:
+
+### 🔐 Secure your workflows 🔐
+- 🚫 Avoid risky events like `pull_request_target` and adopt release-based workflows.


It's not immediately clear what makes a workflow “release-based”. I haven't read past this point at the time of writing this comment, though.
Rephrase, perhaps?

good point - fixed!!

🚫 Avoid risky trigger events in GitHub actions like pull_request_target

Consider using publish workflows on GitHub that are triggered by a GitHub release rather than a pull request merge.

webknjaz · 2024-12-18T01:25:15Z

_posts/2024-12-13-python-packaging-security.md

+
+### 🔐 Secure your workflows 🔐
+- 🚫 Avoid risky events like `pull_request_target` and adopt release-based workflows.
+- ♻️ Don’t cache dependencies in your publish workflows to prevent tampering.


That's probably an oversimplification. Not caching the deps may as well lead to something like this if the transitive deps are unpinned and get updated in uncontrolled manner. The problem was that the cache was poisoned in a job with a security context that had high privileges and then, another job picked it up, effectively reusing what that first one “pinned” in the installed deps.

But if there's no cache, a compromised transitive dependency can be downloaded from PyPI, resulting in the same problem.

Security exists in context and not in isolation. It should come from a place of understanding how different bits fit together.

That said, I wouldn't make a call of whether using caching or not will save us all. The problem was that it was possible to compromise the cache that ended up being used.

I think it's possible to use cache securely by having separate cache keys unique to the dist building job. So caches for building the dists and testing them would not leak into each other. That's the real fix for cache.

Another side of this is the supply chain problem, which cache ended up being a part of in this instance. But as I mentioned earlier, the supply chain could be compromised differently, without cache being a factor. The solution to this is pinning your dependencies. This means recording every single version number for every single transitive dependency there is. One of the low-level tools to use for this is pip-tools, in the context of pip install (until we have PEP 751, that is). Non-pip installers have different mechanisms. With pip, you can use pip install -r requirements.in -c constraints.txt, where requirements.in would contain a direct and unrestricted dependency list, and constraints.txt would set the entire tree to exact versions. A complete solution is usually integration of different aspects through multiple tools (or a "one-tool-to-rule-them-all" which some projects aim to become). I've been taking this to extremes with "DIY lock files" in https://github.com/webknjaz/ep2024-multilock-toy-template / https://github.com/ansible/awx-plugins/tree/2c2b22c/dependencies/lock-files.
Additionally, there's --require-hashes which also records every single dist file hash, not just version (this is not compatible with -c, though, due to a bug in pip).

To summarize, the solution here is to be in control of your deps, make sure their updates are transparent and controllable on all the levels, and to know what influences them… Out of the context, disallowing caching is a band-aid at best. But it may be harmful in that it would cultivate a false sense of security.

ok. This is super helpful. I could remove the caching part altogether, given so many moving pieces and caching seems more nuanced then just locking down the workflow itself with better triggers and tighter permissions. But I also have a question about the dependencies. Because it is my understanding and experience that pinning all deps can also be highly problematic for users. For instance, if you have an environment with many different packages and try installing a package with pinned dips, that could lead to significant user issues.

Unless we're just talking about dips in the workflow itself, such as hatch, pypa/build, etc.

web apps make sense to pin but i can't quite wrap my head around pinning deps for a package. Could you please help me understand? And do you agree - maybe for more beginner users we just drop caching altogether as a topic?

webknjaz · 2024-12-18T01:32:27Z

_posts/2024-12-13-python-packaging-security.md

+### 🔐 Secure your workflows 🔐
+- 🚫 Avoid risky events like `pull_request_target` and adopt release-based workflows.
+- ♻️ Don’t cache dependencies in your publish workflows to prevent tampering.
+- If you reference branches in a pull request, clean or sanitize branch names in your workflow.


Perhaps, link https://docs.github.com/en/get-started/using-git/dealing-with-special-characters-in-branch-and-tag-names and recommend that people use https://woodruffw.github.io/zizmor/.

And maybe https://securitylab.github.com/resources/github-actions-untrusted-input/.

Great suggestions!! I may add these links lower down in the post at the end, so the takeaways are just text.. but I'll definitely include them. This is what I have now:

If you reference branches in a pull request, clean or sanitize branch names in your workflow.

Consider using zizmor to check your GitHub workflows for security vulnerabilities!

i haven't used zizmor yet but i saw a package that Hugo maintains uses it so i may try it out with stravalib and pyosmeta!!

webknjaz · 2024-12-18T01:33:46Z

_posts/2024-12-13-python-packaging-security.md

+
+### Lock down GitHub repo access
+- 🔒 Restrict repository access to essential maintainers only.
+- ✅ Add automated checks to ensure releases are authorized and secure.


I think this is mostly about setting an environment in the job and making sure that environment has required reviewers in the repo settings.

Looking at the blog posts, it seemed like the hackers somehow got direct access to that repo. Did i read that wrong?

webknjaz · 2024-12-18T02:24:01Z

_posts/2024-12-13-python-packaging-security.md

+
+* [<i class="fa-brands fa-discourse"></i> Discourse](https://pyopensci.discourse.group/)
+* [<i class="fa-brands fa-mastodon"></i> Mastodon](https://fosstodon.org/@pyopensci)
+* [<i class="fa-solid fa-cloud"></i> Bluesky](https://bsky.app/profile/pyopensci.bsky.social)


Tip
Pro tip: consider updating the Bluesky username to pyopensci.org. This can be configured and verified in like 3 minutes (through either DNS or a .well-known document on the website, but adds credibility to the account). You can create a repository in the org, call in .well-known, and add a file with the account DID as follows: webknjaz/.well-known@5a3c310. Just don't forget to enable Pages in the repo settings before clicking Verify.

WOAH!! i never knew about this for bluesky!.
thank you for this tip - and for all of the thoughtful feedback. i'm going to work on this some more tomorrow!

webknjaz · 2024-12-18T02:33:04Z

_posts/2024-12-13-python-packaging-security.md

+  publish:
+    name: >-
+      Publish Python 🐍 distribution 📦 to PyPI
+    if: github.repository_owner == 'pyopensci'


I started using the exact repository ID some time ago. This is more accurate and would survive renames and transfers:

Suggested change

if: github.repository_owner == 'pyopensci'

if: github.repository_id == 174412809

(looked up @ https://api.github.com/repos/pyOpenSci/pyopensci.github.io)

If you want to check for the org, you could use the org ID. Org names are mutable and if the org gets renamed for whatever reason (might be a user account that hosts a repo in some cases), this check would break.

oh wow - I've always used the username as the conditional value. The only challenging piece here for users is they'd have to use the API to get that repo id value. I don't see a clear way to get it from the GitHub GUI. we could make a note about an owner name being mutable however and provide the command to find the ID (it requires a token i think?).

_posts/2024-12-13-python-packaging-security.md

webknjaz · 2024-12-18T02:36:51Z

_posts/2024-12-13-python-packaging-security.md

+
+You can see how to set up GitHub Actions securely in our own [PyPI publishing GitHub workflow](https://github.com/pyOpenSci/pyosMeta/blob/main/.github/workflows/publish-pypi.yml), which follows the Trusted Publisher approach.
+
+**Note:** Trusted Publisher workflows are currently only available for GitHub. Support for GitLab may be coming in the future—stay tuned!


In the doc you already linked, there's an example for GitLab as well: https://docs.pypi.org/trusted-publishers/adding-a-publisher/#gitlab-cicd. There are 4 platforms that support this mechanism already, and I saw some Bitbucket folks asking around whether they can integrate.

fixed the breakout and added that link!! I somehow read that GitLab support wasn't available yet, but obviously, I wasn't looking in the right place! thanks!

_posts/2024-12-13-python-packaging-security.md

Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <[email protected]>

lwasser · 2024-12-19T02:55:34Z

NOTES: i'm doing a big rewrite (more of a reorg) of this content. I'll start with the high-level concepts of securing the environment / GitHub and GitHub-PyPI, and then the other things like sanitizing branch names and such can come next.
This gives users 2 key takeaways (plus adding Zizmor as a third!) to start with now rather than 10 things, which is a lot! so I'd avoid more reviews until I push these edits given the structure will change (to hopefully be more clear and more accurate!) 🚀

i'm super appreciative of all of the feedback.

feat: draft of lessons learned on publish security

b236a94

lwasser commented Dec 14, 2024

View reviewed changes

_posts/2024-12-13-python-packaging-security.md Outdated Show resolved Hide resolved

lwasser added 3 commits December 13, 2024 19:00

Update _posts/2024-12-13-python-packaging-security.md

fb6318f

fix: a few more typos

d85f1a2

fix: more typos and redundancies

572e705

lwasser commented Dec 17, 2024

View reviewed changes

_posts/2024-12-13-python-packaging-security.md Outdated Show resolved Hide resolved

lwasser commented Dec 17, 2024

View reviewed changes

_posts/2024-12-13-python-packaging-security.md Outdated Show resolved Hide resolved

Apply suggestions from code review

6a1fd45

lwasser marked this pull request as ready for review December 17, 2024 18:51

webknjaz reviewed Dec 18, 2024

View reviewed changes

lwasser and others added 5 commits December 18, 2024 18:15

fix: edits from @webknjaz

029bd04

Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <[email protected]>

fix: edits from @webknjaz

e16facb

Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <[email protected]>

fix: edits from @webknjaz

4a80188

Co-authored-by: 🇺🇦 Sviatoslav Sydorenko (Святослав Сидоренко) <[email protected]>

fix: edits from review

7cf3279

fix: edits from review

1bd173c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] feat: draft of lessons learned on publish security #538

[DRAFT] feat: draft of lessons learned on publish security #538

lwasser commented Dec 14, 2024 •

edited

Loading

lwasser Dec 14, 2024

webknjaz Dec 18, 2024

lwasser Dec 19, 2024

lwasser commented Dec 17, 2024

webknjaz left a comment

webknjaz Dec 18, 2024

lwasser Dec 19, 2024

webknjaz Dec 18, 2024

lwasser Dec 19, 2024

webknjaz Dec 18, 2024

lwasser Dec 19, 2024

webknjaz Dec 18, 2024

lwasser Dec 19, 2024

webknjaz Dec 18, 2024

lwasser Dec 19, 2024

webknjaz Dec 18, 2024

lwasser Dec 19, 2024

webknjaz Dec 18, 2024

lwasser Dec 19, 2024

webknjaz Dec 18, 2024

lwasser Dec 19, 2024

lwasser commented Dec 19, 2024


		## Is your PyPI publication workflow secure?

		The recent Python package breach [involving Ultralytics](https://blog.pypi.org/posts/2024-12-11-ultralytics-attack-analysis/) has highlighted the need for secure PyPI publishing workflows. Hackers exploited a GitHub workflow (`pull_request_target`) to inject malicious code, which was published to PyPI. Users who downloaded the package unknowingly allowed their machines to be hijacked for Bitcoin mining.

	if: github.repository_owner == 'pyopensci'
	if: github.repository_id == 174412809


		You can see how to set up GitHub Actions securely in our own [PyPI publishing GitHub workflow](https://github.com/pyOpenSci/pyosMeta/blob/main/.github/workflows/publish-pypi.yml), which follows the Trusted Publisher approach.

		Note: Trusted Publisher workflows are currently only available for GitHub. Support for GitLab may be coming in the future—stay tuned!

[DRAFT] feat: draft of lessons learned on publish security #538

Are you sure you want to change the base?

[DRAFT] feat: draft of lessons learned on publish security #538

Conversation

lwasser commented Dec 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lwasser commented Dec 17, 2024

webknjaz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lwasser commented Dec 19, 2024

lwasser commented Dec 14, 2024 •

edited

Loading