Aren't AWS Cloud Investigations the same as On-Prem? - Part 1 (AWS EC2)
Introduction
If this is your first time visiting my page, welcome! If you're returning, welcome back—I'm excited to start a new mini-series. It may not be a great debate, but there are a lot of people who believe cloud investigations (we’ll be talking AWS in this series) are no different from investigating on-premises workstations or servers. To those people, I say: you are both correct and completely wrong 🤣. Jokes aside, I think this is a widely misunderstood concept that leads to so much pain and wasted time—so it’s worth addressing. Let’s talk about it.
In this mini-series, we’ll talk about some of the most commonly used AWS services, starting with EC2. I’ll share my perspective as a Security Incident Responder/Cloud Investigator and draw parallels, but also point out the significant differences between these cloud services, their on-premises equivalents, and your investigation.
DISCLAIMER: We will certainly talk about EC2, but it would be impossible (and irresponsible) not to cover the web of related AWS services that you would likely encounter in an EC2 instance compromise. Also, I am focusing on the investigation methodology of an AWS incident, not the forensic practices involved in these cases. There are plenty of other blogs and code content covering the forensic side of EC2 investigations, so I’ll save some characters here (see References below).
What is EC2?
If you are new to my blog, I often skip over explaining the basics (one day I will dedicate more time to this), but for now I will keep things high level. Per AWS documentation, "Amazon Elastic Compute Cloud (Amazon EC2) provides on-demand, scalable computing capacity in the Amazon Web Services (AWS) Cloud...An EC2 instance is a virtual server in the AWS Cloud."
That last part is the most important point on the entire page—don't skip over it or make it more complicated than it is (don’t worry, learning AWS Cloud will hold your beer). Since virtual machines also exist on-premises, I think it’s worth exploring the Venn diagram of similarities and differences here.
High Level Similarities
Networking
Assigned IPv4 Address (Private) – No matter what, an EC2 instance will have one of these, and similar to a server or workstation on-prem, you’ll likely want it to communicate with other endpoints on the same network.
Other networking concepts—like routing tables, firewalls, and NACLs—exist as well. They may be applied a bit differently, but the core principles are all there.
Operating System & Software Vulnerabilities
Observability
Logging - both have some level of logging, though on-premise resources will have varying degrees of difficulty for setup and configuration.
Third-Party Tooling - tooling like EDRs and other observability agents can exist on both and can certainly elevate your investigation effectiveness if used and configured properly.
High & Lower Level Differences
Reachability
Elastic IP Address/Public IP Address – You can just make an EC2 instance public… This concept doesn’t exist in the same way in an on-premises enterprise environment. You have to intentionally put a device in a certain spot or maybe install software such as a DDNS server and expose your endpoint through manual configuration or some other OS level type change(s).
None of that is necessary in AWS—as long as the EC2 is in a VPC with an internet gateway attached and a routing rule allowing ingress traffic from the internet. Note: In an on-premises environment, you may have a firewall fronting your NAT’d resources, but those resources are most likely behind a firewall and not public to the internet. In AWS, a simple configuration change on the EC2 itself allows for public access—which only takes a few clicks in the AWS console. In an enterprise AWS environment, those prerequisites will likely already exist, making deploying a public EC2 trivial.
Operating System & Software Vulnerabilities
In AWS, you can build EC2 instances that are derived from OS and software images available on the public AWS Marketplace which means you inherit these images and their baggage. You can also use your own images of course but regardless, there are a variety of options which also means, there's a variety of risk that comes along with their usage in the form of vulnerabilities. The difference here isn't that vulnerabilities do not exist for on-prem virtual machines but the variety and scalability of that risk that comes with moving at cloud speed. When you combine that with the reachability aspect, this can be a very dangerous duo.
Local User & IAM Role Profile
EC2 instances will have a local user provisioned called, "ec2-user" but the most dangerous "user" (only using this term for my folks used to on-prem) would be the Associated IAM Role Profile, which translates to an IAM Role. Once a threat actor makes it onto the EC2 instance, those credentials can be obtained fairly easily if they do not already have them (depends on the attack method). SSRF is certainly one method to get in and if you're using IMDSv2 (and God I hope you have it enforced, not just enabled) certainly makes things more difficult but it does not make that impossible by any means.
Putting aside SSRF, I want to speak from the point in the investigation where you have proof that the threat actor has direct access over the internet to the EC2 instance. If this is the case, unlike on-prem where even if let's say malware allowed the threat actor the same type of access, the attacker likely still has to dump LSASS or find credentials some other way such as exploiting OS services/software/etc., to get usable credentials. In AWS, you would simply just talk to the IMDS and get your credentials and you're good to go. Thus, post-compromise activities move far quicker in Cloud environments like AWS (yes, GCP & Azure too) than on-premise.
IAM Role, Post-Compromise & "The Crossover"
Cloud Service Management & Data Plane Logs
Investigation Methodology
- Investigating the W's (who/what/when/where*)
- The primary similarity is the act of answering these types of questions.
- Host-based artifacts
- Yes, there are still artifacts that can be collected
- Depending on the exploit/vector, usefulness may vary.
- Searching for privilege escalation*
- You may find host-based artifacts that can help tell a traditional story of privilege escalation for some incidents but by and large, this will be a primary difference in investigation tradecraft (see Differences below).
- Network Traffic Analysis
- Netflow is still useful and is not all that different.
- Searching for data exfiltration*
- Another similarity but different tradecraft in that you would need to look at more than just NetFlow data (as one example) to determine exfil.
- Investigation of the W's (who/what/when/where*)
- How
- In an AWS incident involving an EC2 instance, you could start with EDR logs (if you have one installed), but going straight to CloudTrail logs to understand control plane activities often does a really good job of answering the W's.
- Interpretation
- I will use the example of CreateUser control plane event. If you're using your "on-premise mindset", you may think you need to look at local logs to investigate further which is obviously not the correct data source. To be clear, you would want to stay in the CloudTrail logs until you need to pivot to another data source due to unanswered questions.
- The "Who" in your investigation is abstractly, the threat actor but in investigative practice, you are more concerned with the compromised identities. In an EC2 compromise investigation (and really most AWS incidents I've observed), the "Who" that you are investigating at any given point could be:
- The EC2 instance & its associated IAM Role
- They are joined at the hip essentially due to the feature instance profiles
- EC2 service or other AWS service(s)
- Separate IAM Role/User
- Anything else under or in the clouds (😉) depending on how your investigation is going
- Host-Based Artifacts
- How you obtain these should be utilizing AWS native APIs to move at cloud speed, as opposed to waiting for your EnCase, FTK, or whatever imaging software you use to grab a disk copy. Doing it the on-premises way is still possible for EC2 instances, but it’s incredibly slow and requires additional setup.
- Grabbing triage forensic artifacts such as OS event logs will depend on your EDR or agent you want to use and yes, you could leverage a built in cloud service such as AWS SSM to pull these artifacts using scripts, so this would again be the "how" that is different.
- My final thoughts on host-based artifacts however, is if you really feel like you need these, you should have a good reason and hopefully it is more than, "because legal requires them." This is an acceptable answer for regulatory reasons...but for you (the investigator), I hope you have better reasons. These artifacts may help answer a couple of investigative questions, but they’ll leave you high and dry for everything after that. Don’t say I didn’t warn you!
- Privilege Escalation
- This is also a stark difference as I covered with "The Crossover" above and it is something that can trip you up if you bring your on-prem assumptions to the cloud. Although you can spend time looking for privilege escalation via an exploit or software vulnerability, it may not be material to privilege escalation, assuming the compromised EC2 instance and its associated IAM Role did not already grant the attacker administrative level permissions (i.e. Overly Permissive). I referenced this above but the reality is, the attacker would only need to "live off the cloud" and use readily available APIs to escalate their privileges. Simply using AttachRolePolicy with a the AWS-managed AdministratorAccess policy (provided the compromise role has the permission to make this call) and poof, they have everything they need.
- Network Log Analysis
- Your primary data source here would be VPC flow logs (the one time IR or DE actually uses these...just kidding I use them). Assuming you didn't bring your firewall with you, you could use this data source to focus on your instance ID/ENI and could look for typical NetFlow type things: IP-to-IP traffic, Data Exfil by packet size, etc. What I want you to understand however, is that you should NOT assume that because you don't see large traffic volume/size, that no data exfiltration occurred. This would be a fatal mistake and your lawyers and leadership will be very angry at this oversight.
- Data Exfiltration
- The main difference here is that the threat actor doesn’t need to write anything to the EC2 disk volumes, nor do they need to go through the firewall your sysadmins brought from on-premises into AWS (I know you did—it’s okay… well, it may not be, but your secret’s safe with me). If the attacker had access to S3 from the compromised instance role (or created one that does), they could use AWS native services like S3 to copy your bucket data from its source directly to their S3 bucket at great speed. They could also create & share the snapshot to their attacker controlled account by modifying the snapshot's attributes. Again, the attacker would be operating at the speed these cloud environments provide—which is much quicker than you think. It’s all on the AWS backbone, and none of this traffic will go through your firewall—or even VPC flow logs. I know, I'm sorry, this last bullet might have struck your on-premise heart but I promise you will live.