-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API changes pending ADO pipeline agent integration. #113
Conversation
@@ -20,10 +20,11 @@ | |||
"DetectionMetadata": "HighEntropy, MediumConfidence" | |||
}, | |||
{ | |||
"Pattern": "https?:\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@", | |||
"Pattern": "(ftps?|https?):\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Longer term, is there a way we can consolidate all the different regexes spread out through tools and repos (many of which our team owns e2e)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good goal. There's quite a bit of variance in purpose for these regexes, which can be problematic. Some partners are willing to use a back-tracking engine, which allows for different syntax that isn't supported by ours. Expressing regexes in JSON or C# (or other languages) admits escaping into the mix, which compromises readability and introduces friction in testing (in things like online regex evaluators). There are differences in expressing prefixed and suffixed delimiters. Just some of the things to keep in mind. Although I have heard this appeal expressed many times to consolidate, every time I examine the bucket of regexes maintained by one group/scanner/scenario, I see complications in building a single store for the data. That doesn't mean we can't define a consolidation goal for some core set of functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that I am a newbie when it comes to regexes, but I wonder if we could have some central repository of secret regexes that allows for different flavors of regexes for the same purpose to be contributed.
Then they could all be run through the same tests and we could comment on why this one is slightly different from that one, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every time I examine the bucket of regexes maintained by one group/scanner/scenario, I see complications in building a single store for the data.
I'm happy to work with y'all to — even with these constraints — bring something into fruition. The longer we punt on it, the harder it will be to solve. I've unfortunately lost a lot of DRI and debug time over the past few months due to the regexes being so distributed.
I think even something as simple as a repo that's:
/regexes
/<toolname>
/patterns.txt
/patterns.json
/patterns.cs
/<whatever form the tool needs>
will be a good start for a better factoring later. I also have ideas on options to keep the set of patterns internal, and sync a subset of them out to OSS depending on the application/tool. And/or have automation to sync all the distributed regexes in repos to a centralized point.
@michaelcfanning, @nguerrera - let me know if this is something we're willing to discuss further in the immediate future for a brainstorm to have line of sight on a mid-term plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. So I believe in January we will be looking at two related things: 1) additional investments in this library and its factoring, 2) a new general plugin model for our larger e2e tool. That's a great context for us to discuss the topic you've raised and I'll be sure that we cover it. We can talk offline, perhaps early in 2025. :)
while (true) | ||
{ | ||
string key = | ||
IdentifiableSecrets.GenerateStandardBase64Key(IdentifiableMetadata.AzureCacheForRedisChecksumSeed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code previously did generate good test cases but they were indistinguishable from actual allocated keys. The GenerateTruePositiveExamples
in the base class replaces the randomized component with a repeated sequence of characters, making it very obvious this is test data.
|
||
foreach (var secret in RegexPatterns) | ||
{ | ||
if (secret.Pattern.Length < MinimumSecretLength) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the code drop matches based on the length of the regex?
If I'm understanding this correctly, given the following Pattern
:
MinimumSecretLength = 10
Pattern = `gha_\d{10}`
Secret = `gha_0000000000`
This find would be dropped creating an exposure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is exactly right. This confusion (which also relates to the use of min secret length in general) is why I commented out these tests, now that I think about it. I wasn't clear we wanted this API surface area but we also wanted as much compatibility with ADO as possible.
The key change here is to expose the sync object. This will allow me to push this API into a wrapper in ADO. I'm tempted to remove the min length handling as well, maybe in the next change.
Thanks for the note, to be clear, I'm going to remove this code (and the tests corresponding to it).
@@ -20,10 +20,11 @@ | |||
"DetectionMetadata": "HighEntropy, MediumConfidence" | |||
}, | |||
{ | |||
"Pattern": "https?:\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@", | |||
"Pattern": "(ftps?|https?):\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Longer term, is there a way we can consolidate all the different regexes spread out through tools and repos (many of which our team owns e2e)?
@@ -30,13 +30,8 @@ public override Tuple<string, string> GetMatchIdAndName(string match) | |||
|
|||
public override IEnumerable<string> GenerateTruePositiveExamples() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there this nesting and filtering? These both create indirection making it very hard to follow/review the tests. Consider de-DRYing the tests and inlining the examples or this filtering logic. Or, consider changing the name of this function to make it clearer to the reader it's doing some filtering within.
@@ -20,10 +20,11 @@ | |||
"DetectionMetadata": "HighEntropy, MediumConfidence" | |||
}, | |||
{ | |||
"Pattern": "https?:\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@", | |||
"Pattern": "(ftps?|https?):\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that I am a newbie when it comes to regexes, but I wonder if we could have some central repository of secret regexes that allows for different flavors of regexes for the same purpose to be contributed.
Then they could all be run through the same tests and we could comment on why this one is slightly different from that one, etc.
SecretMasker(SecretMasker)
copy contructor as protected to make it callable by derived classes.SecretMasker.Clone
as public virtual, to make it overridable by derived classes.SEC101/127.UrlCredentials
visibility to public to make it independently creatable.SEC101/154.AzureCacheForRedisIdentifiableKey
test example production to call base class (which generates test keys consisting of repeated characters in the randomized component).SecretMasker.DetectSecret(string)
operation if there are no configured regexes, encoded, or explicitly added secret literals.SEC101/127
regex to detect ftp(s) credentials.These changes are required (or the value of these changes simply revealed) by a proof-of-concept to drop in this library to replace the ADO pipeline agent built-in secret masker.