My Methodology to AWS Detection Engineering (Part 2: Risk Assignment)
Introduction
Welcome back to the second installment of the blog series discussing my methodology for threat detection engineering in AWS. I am humbled by the response to Part One, so thank you to everyone who reached out. I'm very happy that some of you found it helpful. In case you missed it, we covered RBA object selection for AWS and how an example of how that could be of use. I recommend reading Part One before continuing here. But since you're back, I assume you are here for Part Two, so let's get into it. We will look at an example application of risk score assignment with the concepts discussed in Part One and the lessons I've learned using the available RBA features in Splunk ES.
This will be a shorter post as I want to focus on the key components that make up the risk assignment rule. Let's start by jumping right into the logic needed in this methodology.
Core Components:
Initial Filter 1
index=notable tag_field="aws"In this approach, we will use the index that contains your triggered detections. This is up to you, but you could just as easily add the logic mentioned in the rest of this blog in every detection rule you have. Just know that the collect command is going to run and output results whenever you execute the correlation search ad-hoc which can cause headaches. I prefer to have one rule to manage my AWS risk assignments, as it gives me the freedom to manage it in one spot…again, up to you.
Initial Filter 2
tag_field="aws"
- This is a filtering field and could be whatever field you use to tag and identify detections as AWS-specific alerts…assuming you have done this in some capacity
- If you don't have this or a similar tag, but you happen to output the "sourcetype" in your detections, you could use this as your filter (sourcetype="aws:*")
- If there is no sourcetype or tag field, then you could get really lazy with it and use the alert name field, search_name="*AWS*"
- If you've made it this far, chances are you probably want to just add the logic below into your individual AWS correlation searches... though I'd argue it's simpler to just add a tag to those searches and use that field as noted in Step 1 😉
Severity & Fidelity
| eval severity=coalesce(severity,"informational") | eval fidelity=coalesce(fidelity,1)
Risk Objects
| eval risk_object=mvappend(aws_identity_arn, aws_principal_id, target_resource_name, src_ip,instance_id,...) | mvexpand risk_object | eval risk_object_type=IF(match(risk_object,"^i-[0-9a-f]{7,17}$|\b(?:\d{1,3}\.){3}\d{1,3}\b"),"system","user")
Base Risk Score
| lookup severity_score.csv severity output score as base_score | eval base_risk_score=base_score | eval risk_score=base_risk_score * fidelityThis creates your base score from the severity_score lookup, which contains simple key/value pairs that align with the scores outlined here or your own custom scoring scheme. This base_score is used in the risk_score calculation by multiplying the fidelity by the base score.
The Magic: Risk Score Assignment
| eval event_time=orig_time | collect index=risk
I was totally kidding when I said "the magic," as this is pretty straightforward. The event_time piece ensures you are pulling in the original _time from the alerted event. If you were doing this within the original detection rules, then it would just be a matter of setting event_time to _time. However, in this approach, we are querying the notable index for the triggered detections, so orig_time is what we need.
The collect statement is what we use to send this information to the risk index. Trust me when I say this method (the collect command) is the most liberating in comparison to the alternatives, which I will briefly touch on later.
Advice
- _raw
- A concatenated field that includes the alert name (search_name), your risk_objects, event_time
- Use the notable description (message)
- Note: This means you would have already configured this in every one of your detection rules and that it includes enough information to make events unique enough (like a timestamp) to prevent "over-throttling". Be descriptive and clear in this message and it may be useful to use a timestamp variable here ($_time$)
- Concatenation of whatever fields you think would capture the core contents of an event that have not already been mentioned or a mix-and-match of them
| eval raw_event_hash=sha256(_raw) | search NOT [search index=risk tag_field="aws" | table raw_event_hash]
Alternative risk assignment methods
There are two methods in particular you could use (sendalert & Risk Adaptive Action) but let me explain why I prefer not to use them:
- The sendalert risk command has some frustrating out-of-the-box limitations, such as not accepting multi-valued fields for the risk_objects, along with some other gotchas which have driven me to insanity. Keen readers will notice there is an mvexpand above, and the collect command method has a similar limitation. Both can be overcome with mvexpand on the risk_object, but sendalert has additional shortcomings that would require another paragraph to explain. I'll spare you the details... don't use this unless you hate yourself (kidding, relax).
- Risk Adaptive Action is nice until you really want to manipulate the risk_score based on custom logic which requires "Risk Factor" configuration
Conclusion
See you soon!