Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API changes pending ADO pipeline agent integration. #113

Merged
merged 6 commits into from
Dec 13, 2024

Conversation

michaelcfanning
Copy link
Member

@michaelcfanning michaelcfanning commented Dec 11, 2024

  • BUG: Mark SecretMasker(SecretMasker) copy contructor as protected to make it callable by derived classes.
  • BUG: Mark SecretMasker.Clone as public virtual, to make it overridable by derived classes.
  • BUG: Update SEC101/127.UrlCredentials visibility to public to make it independently creatable.
  • BUG: Update SEC101/154.AzureCacheForRedisIdentifiableKey test example production to call base class (which generates test keys consisting of repeated characters in the randomized component).
  • BUG: Short-circuit SecretMasker.DetectSecret(string) operation if there are no configured regexes, encoded, or explicitly added secret literals.
  • FNS: Update SEC101/127 regex to detect ftp(s) credentials.

These changes are required (or the value of these changes simply revealed) by a proof-of-concept to drop in this library to replace the ADO pipeline agent built-in secret masker.

@michaelcfanning michaelcfanning changed the title More API changes pending ADO pipeline agent integration.: API changes pending ADO pipeline agent integration. Dec 11, 2024
@michaelcfanning michaelcfanning marked this pull request as ready for review December 11, 2024 23:34
@@ -20,10 +20,11 @@
"DetectionMetadata": "HighEntropy, MediumConfidence"
},
{
"Pattern": "https?:\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@",
"Pattern": "(ftps?|https?):\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftps

Somehow our URL credentials pattern got out of sync with ADO tests. I believe this happened because we had to replace this pattern entirely because the original ADO form contains back-tracking, which we do not support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longer term, is there a way we can consolidate all the different regexes spread out through tools and repos (many of which our team owns e2e)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good goal. There's quite a bit of variance in purpose for these regexes, which can be problematic. Some partners are willing to use a back-tracking engine, which allows for different syntax that isn't supported by ours. Expressing regexes in JSON or C# (or other languages) admits escaping into the mix, which compromises readability and introduces friction in testing (in things like online regex evaluators). There are differences in expressing prefixed and suffixed delimiters. Just some of the things to keep in mind. Although I have heard this appeal expressed many times to consolidate, every time I examine the bucket of regexes maintained by one group/scanner/scenario, I see complications in building a single store for the data. That doesn't mean we can't define a consolidation goal for some core set of functionality.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that I am a newbie when it comes to regexes, but I wonder if we could have some central repository of secret regexes that allows for different flavors of regexes for the same purpose to be contributed.

Then they could all be run through the same tests and we could comment on why this one is slightly different from that one, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every time I examine the bucket of regexes maintained by one group/scanner/scenario, I see complications in building a single store for the data.

I'm happy to work with y'all to — even with these constraints — bring something into fruition. The longer we punt on it, the harder it will be to solve. I've unfortunately lost a lot of DRI and debug time over the past few months due to the regexes being so distributed.

I think even something as simple as a repo that's:

/regexes
   /<toolname>
     /patterns.txt
     /patterns.json
     /patterns.cs
    /<whatever form the tool needs>

will be a good start for a better factoring later. I also have ideas on options to keep the set of patterns internal, and sync a subset of them out to OSS depending on the application/tool. And/or have automation to sync all the distributed regexes in repos to a centralized point.

@michaelcfanning, @nguerrera - let me know if this is something we're willing to discuss further in the immediate future for a brainstorm to have line of sight on a mid-term plan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. So I believe in January we will be looking at two related things: 1) additional investments in this library and its factoring, 2) a new general plugin model for our larger e2e tool. That's a great context for us to discuss the topic you've raised and I'll be sure that we cover it. We can talk offline, perhaps early in 2025. :)

while (true)
{
string key =
IdentifiableSecrets.GenerateStandardBase64Key(IdentifiableMetadata.AzureCacheForRedisChecksumSeed,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenerateStandardBase64Key

this code previously did generate good test cases but they were indistinguishable from actual allocated keys. The GenerateTruePositiveExamples in the base class replaces the randomized component with a repeated sequence of characters, making it very obvious this is test data.


foreach (var secret in RegexPatterns)
{
if (secret.Pattern.Length < MinimumSecretLength)
Copy link
Member

@rwoll rwoll Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the code drop matches based on the length of the regex?

If I'm understanding this correctly, given the following Pattern:

MinimumSecretLength = 10
Pattern = `gha_\d{10}`
Secret = `gha_0000000000`

This find would be dropped creating an exposure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is exactly right. This confusion (which also relates to the use of min secret length in general) is why I commented out these tests, now that I think about it. I wasn't clear we wanted this API surface area but we also wanted as much compatibility with ADO as possible.

The key change here is to expose the sync object. This will allow me to push this API into a wrapper in ADO. I'm tempted to remove the min length handling as well, maybe in the next change.

Thanks for the note, to be clear, I'm going to remove this code (and the tests corresponding to it).

rwoll
rwoll previously approved these changes Dec 12, 2024
@@ -20,10 +20,11 @@
"DetectionMetadata": "HighEntropy, MediumConfidence"
},
{
"Pattern": "https?:\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@",
"Pattern": "(ftps?|https?):\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longer term, is there a way we can consolidate all the different regexes spread out through tools and repos (many of which our team owns e2e)?

docs/ReleaseHistory.md Show resolved Hide resolved
@@ -30,13 +30,8 @@ public override Tuple<string, string> GetMatchIdAndName(string match)

public override IEnumerable<string> GenerateTruePositiveExamples()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there this nesting and filtering? These both create indirection making it very hard to follow/review the tests. Consider de-DRYing the tests and inlining the examples or this filtering logic. Or, consider changing the name of this function to make it clearer to the reader it's doing some filtering within.

@@ -20,10 +20,11 @@
"DetectionMetadata": "HighEntropy, MediumConfidence"
},
{
"Pattern": "https?:\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@",
"Pattern": "(ftps?|https?):\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that I am a newbie when it comes to regexes, but I wonder if we could have some central repository of secret regexes that allows for different flavors of regexes for the same purpose to be contributed.

Then they could all be run through the same tests and we could comment on why this one is slightly different from that one, etc.

@michaelcfanning michaelcfanning merged commit 034040c into main Dec 13, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants