API changes pending ADO pipeline agent integration. #113

michaelcfanning · 2024-12-11T22:59:58Z

BUG: Mark SecretMasker(SecretMasker) copy contructor as protected to make it callable by derived classes.
BUG: Mark SecretMasker.Clone as public virtual, to make it overridable by derived classes.
BUG: Update SEC101/127.UrlCredentials visibility to public to make it independently creatable.
BUG: Update SEC101/154.AzureCacheForRedisIdentifiableKey test example production to call base class (which generates test keys consisting of repeated characters in the randomized component).
BUG: Short-circuit SecretMasker.DetectSecret(string) operation if there are no configured regexes, encoded, or explicitly added secret literals.
FNS: Update SEC101/127 regex to detect ftp(s) credentials.

These changes are required (or the value of these changes simply revealed) by a proof-of-concept to drop in this library to replace the ADO pipeline agent built-in secret masker.

michaelcfanning · 2024-12-11T23:35:51Z

GeneratedRegexPatterns/MediumConfidenceSecurityModels.json

@@ -20,10 +20,11 @@
    "DetectionMetadata": "HighEntropy, MediumConfidence"
  },
  {
-    "Pattern": "https?:\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@",
+    "Pattern": "(ftps?|https?):\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@",


ftps

Somehow our URL credentials pattern got out of sync with ADO tests. I believe this happened because we had to replace this pattern entirely because the original ADO form contains back-tracking, which we do not support.

Longer term, is there a way we can consolidate all the different regexes spread out through tools and repos (many of which our team owns e2e)?

It's a good goal. There's quite a bit of variance in purpose for these regexes, which can be problematic. Some partners are willing to use a back-tracking engine, which allows for different syntax that isn't supported by ours. Expressing regexes in JSON or C# (or other languages) admits escaping into the mix, which compromises readability and introduces friction in testing (in things like online regex evaluators). There are differences in expressing prefixed and suffixed delimiters. Just some of the things to keep in mind. Although I have heard this appeal expressed many times to consolidate, every time I examine the bucket of regexes maintained by one group/scanner/scenario, I see complications in building a single store for the data. That doesn't mean we can't define a consolidation goal for some core set of functionality.

Keep in mind that I am a newbie when it comes to regexes, but I wonder if we could have some central repository of secret regexes that allows for different flavors of regexes for the same purpose to be contributed.

Then they could all be run through the same tests and we could comment on why this one is slightly different from that one, etc.

every time I examine the bucket of regexes maintained by one group/scanner/scenario, I see complications in building a single store for the data.

I'm happy to work with y'all to — even with these constraints — bring something into fruition. The longer we punt on it, the harder it will be to solve. I've unfortunately lost a lot of DRI and debug time over the past few months due to the regexes being so distributed.

I think even something as simple as a repo that's:

/regexes /<toolname> /patterns.txt /patterns.json /patterns.cs /<whatever form the tool needs>

will be a good start for a better factoring later. I also have ideas on options to keep the set of patterns internal, and sync a subset of them out to OSS depending on the application/tool. And/or have automation to sync all the distributed regexes in repos to a centralized point.

@michaelcfanning, @nguerrera - let me know if this is something we're willing to discuss further in the immediate future for a brainstorm to have line of sight on a mid-term plan.

Sure thing. So I believe in January we will be looking at two related things: 1) additional investments in this library and its factoring, 2) a new general plugin model for our larger e2e tool. That's a great context for us to discuss the topic you've raised and I'll be sure that we cover it. We can talk offline, perhaps early in 2025. :)

michaelcfanning · 2024-12-11T23:38:57Z

...ilities.Core/PreciselyClassifiedSecurityKeys/SEC101_154_AzureCacheForRedisIdentifiableKey.cs

-            while (true)
-            {
-                string key =
-                    IdentifiableSecrets.GenerateStandardBase64Key(IdentifiableMetadata.AzureCacheForRedisChecksumSeed,


GenerateStandardBase64Key

this code previously did generate good test cases but they were indistinguishable from actual allocated keys. The GenerateTruePositiveExamples in the base class replaces the randomized component with a repeated sequence of characters, making it very obvious this is test data.

src/Tests.Microsoft.Security.Utilities.Core/SecretMaskerTests.cs

rwoll · 2024-12-11T23:42:42Z

src/Microsoft.Security.Utilities.Core/SecretMasker.cs

+
+            foreach (var secret in RegexPatterns)
+            {
+                if (secret.Pattern.Length < MinimumSecretLength)


Why does the code drop matches based on the length of the regex?

If I'm understanding this correctly, given the following Pattern:

MinimumSecretLength = 10 Pattern = `gha_\d{10}` Secret = `gha_0000000000`

This find would be dropped creating an exposure.

Yes, this is exactly right. This confusion (which also relates to the use of min secret length in general) is why I commented out these tests, now that I think about it. I wasn't clear we wanted this API surface area but we also wanted as much compatibility with ADO as possible.

The key change here is to expose the sync object. This will allow me to push this API into a wrapper in ADO. I'm tempted to remove the min length handling as well, maybe in the next change.

Thanks for the note, to be clear, I'm going to remove this code (and the tests corresponding to it).

rwoll · 2024-12-12T21:55:50Z

GeneratedRegexPatterns/MediumConfidenceSecurityModels.json

@@ -20,10 +20,11 @@
    "DetectionMetadata": "HighEntropy, MediumConfidence"
  },
  {
-    "Pattern": "https?:\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@",
+    "Pattern": "(ftps?|https?):\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@",


Longer term, is there a way we can consolidate all the different regexes spread out through tools and repos (many of which our team owns e2e)?

docs/ReleaseHistory.md

rwoll · 2024-12-12T21:59:04Z

...ilities.Core/PreciselyClassifiedSecurityKeys/SEC101_154_AzureCacheForRedisIdentifiableKey.cs

@@ -30,13 +30,8 @@ public override Tuple<string, string> GetMatchIdAndName(string match)

        public override IEnumerable<string> GenerateTruePositiveExamples()


Why is there this nesting and filtering? These both create indirection making it very hard to follow/review the tests. Consider de-DRYing the tests and inlining the examples or this filtering logic. Or, consider changing the name of this function to make it clearer to the reader it's doing some filtering within.

nguerrera · 2024-12-13T17:05:54Z

GeneratedRegexPatterns/MediumConfidenceSecurityModels.json

@@ -20,10 +20,11 @@
    "DetectionMetadata": "HighEntropy, MediumConfidence"
  },
  {
-    "Pattern": "https?:\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@",
+    "Pattern": "(ftps?|https?):\\/\\/(?:[^:@]+):(?<refine>[^:@?]+)@",


Keep in mind that I am a newbie when it comes to regexes, but I wonder if we could have some central repository of secret regexes that allows for different flavors of regexes for the same purpose to be contributed.

Then they could all be run through the same tests and we could comment on why this one is slightly different from that one, etc.

michaelcfanning added 4 commits December 11, 2024 14:46

More API changes pending ADO pipeline agent integration.:

8a14491

Add new tests

66e3ece

Update generated rules files.

856beac

Update release notes.

978d70b

michaelcfanning changed the title ~~More API changes pending ADO pipeline agent integration.:~~ API changes pending ADO pipeline agent integration. Dec 11, 2024

michaelcfanning marked this pull request as ready for review December 11, 2024 23:34

michaelcfanning requested review from cfaucon, suvamM, rwoll and nguerrera as code owners December 11, 2024 23:34

michaelcfanning commented Dec 11, 2024

View reviewed changes

src/Tests.Microsoft.Security.Utilities.Core/SecretMaskerTests.cs Show resolved Hide resolved

rwoll reviewed Dec 11, 2024

View reviewed changes

Remove short secrets filtering.:

b765cfa

rwoll previously approved these changes Dec 12, 2024

View reviewed changes

Merge branch 'main' into expose-secret-literals-and-sync-object

ae6b375

michaelcfanning dismissed rwoll’s stale review via ae6b375 December 13, 2024 16:58

nguerrera approved these changes Dec 13, 2024

View reviewed changes

michaelcfanning merged commit 034040c into main Dec 13, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API changes pending ADO pipeline agent integration. #113

API changes pending ADO pipeline agent integration. #113

michaelcfanning commented Dec 11, 2024 •

edited

Loading

michaelcfanning Dec 11, 2024

rwoll Dec 12, 2024

michaelcfanning Dec 13, 2024

nguerrera Dec 13, 2024

rwoll Dec 13, 2024

michaelcfanning Dec 13, 2024

michaelcfanning Dec 11, 2024

rwoll Dec 11, 2024 •

edited

Loading

michaelcfanning Dec 12, 2024

rwoll Dec 12, 2024

rwoll Dec 12, 2024

nguerrera Dec 13, 2024

		@@ -30,13 +30,8 @@ public override Tuple<string, string> GetMatchIdAndName(string match)

		public override IEnumerable<string> GenerateTruePositiveExamples()

API changes pending ADO pipeline agent integration. #113

API changes pending ADO pipeline agent integration. #113

Conversation

michaelcfanning commented Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rwoll Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelcfanning commented Dec 11, 2024 •

edited

Loading

rwoll Dec 11, 2024 •

edited

Loading