dynamic: add sequence scope #2532

williballenthin · 2024-12-09T13:36:29Z

This PR implements the dynamic "sequence scope" introduced here: mandiant/capa-rules#951

In summary, we want a way to match across calls (in dynamic mode) without resorting to the entire thread (which may be very long, like thousands of events). So, we add a new scope "sequence" that represents the sliding 20-tuples of calls across each thread. Rules can match against any set of logic within each of these 20-tuples.

For example, consider the initial behavior of thread 3064 in our test CAPE file 0000a657:

This is a long thread with many calls, so yesterday it was tough to write a rule for any behavior that spans multiple calls without introducing false positives. Consider matching on the dynamic resolution and invocation of AddVectoredExceptionHandler. Now we can write a rule like:

So, within a region of 20 calls, match all this logic.

Here's what the output looks like:

The implementation is pretty easy: maintain a deque of the trailing 5 call events, merging and matching those features.

I picked 20 fairly randomly. I think we can tweak this number as necessary. Smaller and its harder to match logic. Larger and the performance might decrease a bit, and then there's more FP possibility. But I don't think this is too risky.

I think this will affect runtime a bit, since we're matching features twice for each call event (one for the precise call event, one for the sliding window).

There's probably some edge cases to work out around overlapping windows. Consider a rule that matches a single call event within a sequence: that call event is contained by 20 sequences (some covering the events before, some covering the events after). So, we may have to do a little more work (TODO) to not emit those matches twice. I'm not precisely sure of the behavior at this moment. I'll write a test for it.

Checklist

changelog update needed
documentation needed

addresses discussion in mandiant/capa-rules#951

CHANGELOG updated or no update needed, thanks! 😄

williballenthin · 2024-12-09T13:43:50Z

we also may want to update the vverbose render to only show each call event once, leaving the match details to a separate section, maybe like:

sequence: processs1, pid, tid, calls{1, 2}
  and:
    api: CreateFile @ call{1}
    api: CloseFile @ call{2}
  referenced call events:
    call{1}: CreateFile
    call{2}: CloseFile

williballenthin · 2024-12-09T13:47:44Z

@jorik-utwente FYI

capa/capabilities/dynamic.py

williballenthin · 2024-12-09T13:52:26Z

I realize I dropped this PR without much warning 😇 I went from "I wonder how this would work" to "huh, it seems to work OK" pretty quickly.

mr-tz

awesome, this looks very promising already!!

major things to discuss include the naming and potentially handling of loops

capa/capabilities/dynamic.py

mr-tz · 2024-12-09T15:27:17Z

CHANGELOG.md

@@ -4,6 +4,8 @@

 ### New Features

+- add dynamic sequence scope for matching nearby calls within a thread #2532 @williballenthin


naming alternatives to sequence (matching occurs in any order): span, ngram, group/cluster

+1 cluster

"window", "slice", "range"

math: multiset (or bag, or mset) - https://en.wikipedia.org/wiki/Multiset

multiple instances of same object

order doesn't matter

optionally prefix with "call", e.g., callbag, callcluster?

capa/capabilities/dynamic.py

williballenthin · 2024-12-09T15:51:09Z

potentially handling of loops

Good point. I think we'd want to see how this works in practice against a large number of samples and the rules we can translate to use this construct. In particular, loops (like you say) such as you'd see in ransomware.

mike-hunhoff

Great work, I'm excited about where this is going for an initial implementation. I echo a few of @mr-tz 's comments/concerns. Additionally, the value 5 comes close to being too small for some of our existing rules, e.g. https://github.com/mandiant/capa-rules/blob/e033410c8910f8b46718a5eefd9f0c7768be1b99/communication/c2/shell/create-reverse-shell.yml#L19-L23 so we'll need to do some additional work to find the sweet spot.

capa/capabilities/dynamic.py

mr-tz

I spent a few moments focusing on the core extension here and added some places for additional documentation.

capa/capabilities/dynamic.py

tests/test_dynamic_sequence_scope.py

also, for repeating behavior, match only the first instance.

williballenthin · 2024-12-12T15:34:52Z

computing the features for the sequence, which involves merging features from many calls, seems to take quite a bit of time:

i'll have to think on whether there's a creative way to optimize this

profile information

before: sequence length: 20

before: sequence length: 0

(convenient this works!)

optimized, sequence length 1 and 20:

conclusion:

So, there's a bit of overhead to use this new algorithm, but it's independent of SEQUENCE_LENGTH, which is desirable.

capa/capabilities/dynamic.py

mr-tz · 2024-12-16T09:38:22Z

TODO?!

test sequence scope with submatch (call scope)
test sequence scope with submatch (sequence scope)
test sequence scope with submatch (thread or other scope - error?)

williballenthin · 2024-12-17T13:47:18Z

I've run into some bugs where sequence scoped rules matching sequence scoped rules that are hammered can't be tracked well. Currently thinking this through and figuring out a fix.

Details:

Sequence scope matches logic found within a sliding window of calls, currently of length 20. To avoid showing too many results when a program "hammers" a behavior, such as calling sleep in a tight loop, the sequence engine only "publishes" a match if it wasn't seen in the prior sequence. As long as the length of the sequence is larger than the behavior within a tight loop, capa shows the match at the first loop iteration and does not "publish" the subsequent run of duplicate behavior.

By "publish" we mean that capa reports on a match, recording it within the result document and rendering it to the user. The sequence engine still recognizes all the matches within the tight loop, and other rules can match against those matches, they just aren't propagated into the final results.

We think this behavior is generally agreeable and intuitive; however, there's a subtle bug. When we build the result document to show vverbose output, which prints the precise data used to make a match, a rule that depends on another rule that has been hammered can't be easily resolved. If we haven't published the hammered rule, then we can't easily show how the dependent rule matched, because the match details aren't available.

For example, in 0000a... for PID 1852 and thread 2596, there's a tight loop of NtAllocateVirtualMemory that triggers allocate or change RWX memory on the first match (id: 301), but is recognized to "hammer" and therefore subsequent matches are suppressed. Way later, at id 1257, are calls to WriteProcessMemory and CreateRemoteThread, which along with a match for allocate or change RWX memory, trigger inject thread. But when we go to render the results for inject thread the match details for allocate or change RWX memory are not available within 20 (or even 900) calls.

My current theory for a fix is: when there's a newly recognized rule (that is, not hammered), walk it's logic, and if it depends on any other rules, ensure they're published (even if they were hammered). This will take a bit more code, and will require some inline documentation like I've added above, but should be enough to ensure we can always prove why capa matched some logic.

mr-tz · 2024-12-17T17:52:09Z

I've run into some bugs where sequence scoped rules matching sequence scoped rules that are hammered can't be tracked well. Currently thinking this through and figuring out a fix.

I think that's expected. If it's an issue that will take longer to fix we should handle it separately. The benefits of the sequence scope (way less FPs vs. thread scope) outweigh the shortcomings here.

contains the call ids for all the calls within the sequence, so we know where to look for related matched.

mr-tz · 2024-12-19T10:09:01Z

capa/features/freeze/__init__.py

+    value: Union[
+        # for absolute, relative, file
+        int,
+        # for DNToken, Process, Thread, Call
+        tuple[int, ...],
+        # for sequence
+        tuple[int, int, int, int, tuple[int, ...]],
+        # for NO_ADDRESS,
+        None,
+    ] = None  # None default value to support deserialization of NO_ADDRESS


thanks, the documentation helps a lot here
we could also add documentation what the values are: ppid, pid, tid, id_, calls or use a dataclass?

mr-tz · 2024-12-19T10:16:57Z

tests/test_dynamic_sequence_scope.py

@@ -181,6 +182,55 @@ def test_dynamic_sequence_scope_length():
    assert r.name not in capabilities.matches


+# show that the DynamicSequenceAddress has the correct structure.
+# temporarily uses a sequence of length 2, for simplicity.


williballenthin · 2024-12-19T21:36:08Z

late night idea: to handle the link between dependent rules in the presence of hammering, I think we can just pick the most recent earlier match (in the same thread). We don't need to track this precisely, such as with specialized sequence addresses, so we can delete some of the recent code.

mr-tz · 2024-12-20T11:11:01Z

to handle the link between dependent rules in the presence of hammering...

Nice, this is a good solution for API hammering across/between dependent rule matches.

I think we still have to come up with a solution for API hammering within a rule?
Example trace:

...
WantedApi1
...
many other calls (n > MAX_SEQUENCE_LENGTH)
...
WantedApi2

features:
  - and:
    - api: WantedApi1
    - api: WantedApi2

dynamic: add sequence scope

2d342cd

addresses discussion in mandiant/capa-rules#951

williballenthin added enhancement New feature or request breaking-change introduces a breaking change that should be released in a major version dynamic related to dynamic analysis flavor labels Dec 9, 2024

This comment was marked as resolved.

Sign in to view

williballenthin added 2 commits December 9, 2024 13:37

Merge branch 'master' into feat/dynamic-sequence-scope

c43e668

changelog

da4be57

williballenthin requested review from mr-tz, mike-hunhoff and yelhamer December 9, 2024 13:39

pep8

e29a370

williballenthin commented Dec 9, 2024

View reviewed changes

capa/capabilities/dynamic.py Outdated Show resolved Hide resolved

williballenthin marked this pull request as draft December 9, 2024 14:34

mr-tz reviewed Dec 9, 2024

View reviewed changes

mike-hunhoff reviewed Dec 9, 2024

View reviewed changes

sequence: add test showing multiple sequences overlapping a single event

6d05d3c

williballenthin force-pushed the feat/dynamic-sequence-scope branch from d6106ea to 6d05d3c Compare December 10, 2024 12:55

mr-tz reviewed Dec 11, 2024

View reviewed changes

capa/capabilities/dynamic.py Show resolved Hide resolved

mr-tz reviewed Dec 11, 2024

View reviewed changes

capa/capabilities/dynamic.py Outdated Show resolved Hide resolved

capa/capabilities/dynamic.py Outdated Show resolved Hide resolved

capa/capabilities/dynamic.py Outdated Show resolved Hide resolved

capa/capabilities/dynamic.py Outdated Show resolved Hide resolved

mr-tz reviewed Dec 11, 2024

View reviewed changes

tests/test_dynamic_sequence_scope.py Outdated Show resolved Hide resolved

williballenthin force-pushed the feat/dynamic-sequence-scope branch 2 times, most recently from c0af878 to a8abb16 Compare December 12, 2024 14:19

capabilities: use dataclasses to represent complicated return types

37f6ccb

williballenthin force-pushed the feat/dynamic-sequence-scope branch from a8abb16 to 37f6ccb Compare December 12, 2024 14:38

sequence: only match first overlapping sequence

b10d591

also, for repeating behavior, match only the first instance.

williballenthin force-pushed the feat/dynamic-sequence-scope branch from ea9daed to b10d591 Compare December 12, 2024 15:14

mr-tz reviewed Dec 13, 2024

View reviewed changes

capa/capabilities/dynamic.py Show resolved Hide resolved

mr-tz mentioned this pull request Dec 16, 2024

tmp: update to newscope (placeholder) mandiant/capa-rules#972

Closed

sequence scope: optimize matching

a31cfd9

williballenthin force-pushed the feat/dynamic-sequence-scope branch from 269a2e0 to a31cfd9 Compare December 16, 2024 14:03

williballenthin added 2 commits December 16, 2024 14:03

Merge branch 'master' into feat/dynamic-sequence-scope

58f42f5

sequence: documentation

69f4728

williballenthin force-pushed the feat/dynamic-sequence-scope branch from 4683882 to 69f4728 Compare December 16, 2024 15:51

sequence: add more tests

8fe6cb2

williballenthin mentioned this pull request Dec 17, 2024

use sequence scope mandiant/capa-rules#973

Draft

sequence: add sequence address

6dde963

contains the call ids for all the calls within the sequence, so we know where to look for related matched.

williballenthin force-pushed the feat/dynamic-sequence-scope branch from ded0e27 to 6dde963 Compare December 18, 2024 12:54

mr-tz reviewed Dec 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dynamic: add sequence scope #2532

dynamic: add sequence scope #2532

williballenthin commented Dec 9, 2024 •

edited

Loading

This comment was marked as resolved.

williballenthin commented Dec 9, 2024 •

edited

Loading

williballenthin commented Dec 9, 2024

williballenthin commented Dec 9, 2024

mr-tz left a comment

mr-tz Dec 9, 2024

mike-hunhoff Dec 9, 2024

williballenthin Dec 12, 2024 •

edited

Loading

mr-tz Dec 16, 2024

mr-tz Dec 16, 2024

williballenthin commented Dec 9, 2024

mike-hunhoff left a comment

mr-tz left a comment

williballenthin commented Dec 12, 2024 •

edited

Loading

mr-tz commented Dec 16, 2024 •

edited by williballenthin

Loading

williballenthin commented Dec 17, 2024 •

edited

Loading

mr-tz commented Dec 17, 2024

mr-tz Dec 19, 2024

mr-tz Dec 19, 2024

williballenthin commented Dec 19, 2024

mr-tz commented Dec 20, 2024

		@@ -4,6 +4,8 @@

		### New Features

		- add dynamic sequence scope for matching nearby calls within a thread #2532 @williballenthin

dynamic: add sequence scope #2532

Are you sure you want to change the base?

dynamic: add sequence scope #2532

Conversation

williballenthin commented Dec 9, 2024 • edited Loading

Checklist

This comment was marked as resolved.

williballenthin commented Dec 9, 2024 • edited Loading

williballenthin commented Dec 9, 2024

williballenthin commented Dec 9, 2024

mr-tz left a comment

Choose a reason for hiding this comment

mr-tz Dec 9, 2024

Choose a reason for hiding this comment

mike-hunhoff Dec 9, 2024

Choose a reason for hiding this comment

williballenthin Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

mr-tz Dec 16, 2024

Choose a reason for hiding this comment

mr-tz Dec 16, 2024

Choose a reason for hiding this comment

williballenthin commented Dec 9, 2024

mike-hunhoff left a comment

Choose a reason for hiding this comment

mr-tz left a comment

Choose a reason for hiding this comment

williballenthin commented Dec 12, 2024 • edited Loading

profile information

before: sequence length: 20

before: sequence length: 0

optimized, sequence length 1 and 20:

conclusion:

mr-tz commented Dec 16, 2024 • edited by williballenthin Loading

williballenthin commented Dec 17, 2024 • edited Loading

mr-tz commented Dec 17, 2024

mr-tz Dec 19, 2024

Choose a reason for hiding this comment

mr-tz Dec 19, 2024

Choose a reason for hiding this comment

williballenthin commented Dec 19, 2024

mr-tz commented Dec 20, 2024

williballenthin commented Dec 9, 2024 •

edited

Loading

williballenthin commented Dec 9, 2024 •

edited

Loading

williballenthin Dec 12, 2024 •

edited

Loading

williballenthin commented Dec 12, 2024 •

edited

Loading

mr-tz commented Dec 16, 2024 •

edited by williballenthin

Loading

williballenthin commented Dec 17, 2024 •

edited

Loading