Aren't AWS Cloud Investigations the same as On-Prem?

Introduction

If this is your first time visiting my page, welcome! If you're returning, welcome back—I'm excited to start a new mini-series. It may not be a great debate, but there are a lot of people who believe cloud investigations (we’ll be talking AWS in this series) are no different from investigating on-premises workstations or servers. To those people, I say: you are both correct and completely wrong 🤣. Jokes aside, I think this is a widely misunderstood concept that leads to so much pain and wasted time—so it’s worth addressing. Let’s talk about it.

In this mini-series, we’ll talk about some of the most commonly used AWS services, starting with EC2. I’ll share my perspective as a Security Incident Responder/Cloud Investigator and draw parallels, but also point out the significant differences between these cloud services, their on-premises equivalents, and your investigation.

DISCLAIMER: We will certainly talk about EC2, but it would be impossible (and irresponsible) not to cover the web of related AWS services that you would likely encounter in an EC2 instance compromise. Also, I am focusing on the investigation methodology of an AWS incident, not the forensic practices involved in these cases. There are plenty of other blogs and code content covering the forensic side of EC2 investigations, so I’ll save some characters here (see References below).

What is EC2?

If you are new to my blog, I often skip over explaining the basics (one day I will dedicate more time to this), but for now I will keep things high level. Per AWS documentation, "Amazon Elastic Compute Cloud (Amazon EC2) provides on-demand, scalable computing capacity in the Amazon Web Services (AWS) Cloud...An EC2 instance is a virtual server in the AWS Cloud."

That last part is the most important point on the entire page—don't skip over it or make it more complicated than it is (don’t worry, learning AWS Cloud will hold your beer). Since virtual machines also exist on-premises, I think it’s worth exploring the Venn diagram of similarities and differences here.

High Level Similarities

Networking

Assigned IPv4 Address (Private) – No matter what, an EC2 instance will have one of these, and similar to a server or workstation on-prem, you’ll likely want it to communicate with other endpoints on the same network.

Other networking concepts—like routing tables, firewalls, and NACLs—exist as well. They may be applied a bit differently, but the core principles are all there.

Operating System & Software Vulnerabilities

Operating Systems and installed software - will exist similar to their counterparts. This means similar vulnerabilities (and exploits) will exist for both environments. Terrible news, I know.

Observability

Logging - both have some level of logging, though on-premise resources will have varying degrees of difficulty for setup and configuration.

Third-Party Tooling - tooling like EDRs and other observability agents can exist on both and can certainly elevate your investigation effectiveness if used and configured properly.

High & Lower Level Differences

Reachability

Elastic IP Address/Public IP Address – You can just make an EC2 instance public… This concept doesn’t exist in the same way in an on-premises enterprise environment. You have to intentionally put a device in a certain spot or maybe install software such as a DDNS server and expose your endpoint through manual configuration or some other OS level type change(s).

None of that is necessary in AWS—as long as the EC2 is in a VPC with an internet gateway attached and a routing rule allowing ingress traffic from the internet. Note: In an on-premises environment, you may have a firewall fronting your NAT’d resources, but those resources are most likely behind a firewall and not public to the internet. In AWS, a simple configuration change on the EC2 itself allows for public access—which only takes a few clicks in the AWS console. In an enterprise AWS environment, those prerequisites will likely already exist, making deploying a public EC2 trivial.

Operating System & Software Vulnerabilities

In AWS, you can build EC2 instances that are derived from OS and software images available on the public AWS Marketplace which means you inherit these images and their baggage. You can also use your own images of course but regardless, there are a variety of options which also means, there's a variety of risk that comes along with their usage in the form of vulnerabilities. The difference here isn't that vulnerabilities do not exist for on-prem virtual machines but the variety and scalability of that risk that comes with moving at cloud speed. When you combine that with the reachability aspect, this can be a very dangerous duo.

Local User & IAM Role Profile

EC2 instances will have a local user provisioned called, "ec2-user" but the most dangerous "user" (only using this term for my folks used to on-prem) would be the Associated IAM Role Profile, which translates to an IAM Role. Once a threat actor makes it onto the EC2 instance, those credentials can be obtained fairly easily if they do not already have them (depends on the attack method). SSRF is certainly one method to get in and if you're using IMDSv2 (and God I hope you have it enforced, not just enabled) certainly makes things more difficult but it does not make that impossible by any means.

Putting aside SSRF, I want to speak from the point in the investigation where you have proof that the threat actor has direct access over the internet to the EC2 instance. If this is the case, unlike on-prem where even if let's say malware allowed the threat actor the same type of access, the attacker likely still has to dump LSASS or find credentials some other way such as exploiting OS services/software/etc., to get usable credentials. In AWS, you would simply just talk to the IMDS and get your credentials and you're good to go. Thus, post-compromise activities move far quicker in Cloud environments like AWS (yes, GCP & Azure too) than on-premise.

IAM Role, Post-Compromise & "The Crossover"

Let's dive into this key difference a bit more—because this is where the real crossover happens. Up to this point, if you’re dealing with exploited software or services on EC2, endpoint telemetry—like EDR or OS-level logging—looks pretty similar to what you’d expect on-premises. But once credentials are compromised, things can go downhill much faster than on-premises—and the threat actor doesn’t need to hunt for specific "users" (IAM roles) like they would in a traditional environment. There can only be one IAM role associated with an EC2 instance, and it can be enumerated using the IMDS. In theory, this single role could have everything an attacker needs from an access perspective—right out of the gate. Additionally, you have the entire suite of AWS service APIs at your disposal to test access (and that’s only a slight exaggeration—it may not always be true, but sadly, it’s definitely possible).

There’s no need to waste time on network discovery, searching for a system with higher privileges, or exploiting vulnerabilities to escalate privileges. If you don’t have the privileges you want, you can grant them to yourself with a single API call—assuming it isn’t blocked. This is the moment in the investigation when you realize your on-premises incident response experience will only take you but so far. This is "The Crossover" point, where the threat actor no longer needs anything provided by the OS or software—and there’s no guarantee they’ll leave behind additional OS-specific artifacts to help you understand AWS control plane or data plane activities. From here on out, almost everything can be accomplished using the AWS API. Sure, depending on the attacker’s access and command-and-control (C2) setup, you might see some command lines in OS evidence sources that log process activity. But realistically, they can proxy everything through their C2, running commands remotely—and leave none of those events behind on the host. Welcome to "The Crossover."

Cloud Service Management & Data Plane Logs

Thankfully, this is the land that’s logged very well in AWS CloudTrail, and depending on your ingestion method, latency is likely under 15 minutes (in my experience, it’s usually 2–3 minutes). This assumes you have (and can afford) data event logging enabled—I hear some headshakes out there, and I’ll pour one out for you, because this is definitely a luxury that should not be taken for granted. The CloudTrail Management and Data Plane concepts are explained here, but in short: if you want to know the changes made to AWS services in your account(s), you’re probably looking for CloudTrail management events. If you want to know about operations performed on or involving the resource and the data itself, you are probably looking for CloudTrail Data events. This distinction alone is enough to make your head hurt but I digress.

As it relates to the EC2 service, you will want to focus on eventSource="ec2.amazonaws.com" within the CloudTrail logged events. As I noted before, depending on the attack vector, there may not be much here to help you investigate the Initial Access phase (in fact, this is often the case in AWS investigations). As you progress through the tactic phases to Execution, you may get some host artifacts such as scripts or file(s) dropped and used to by the TA. AWS-specific malware withstanding, I would focus your attention to CloudTrail events with the "userIdentity.arn" that involves the assumed role identity that is associated with the compromised EC2 instance, userIdentity.arn="arn:aws:iam::0123456789012:assumed-role/got_pwned_role/i-0123456789example."

To be clear, the act of investigating logged events for the compromised identity is the same on-prem but you no longer need to be a subject matter expert in Windows/Linux operating systems. You now need to become a SME in the operating system called AWS and its core services, some of which (like IAM) could be considered its own OS due their complexity. Obviously no one will be a SME in all of these areas but getting very familiar with EC2 and the most commonly used services related to EC2 such as: VPC, EBS, & IAM to name a few, will be critical in your investigations involving compromised EC2 instances. Chances are you will be able to focus on a few key API calls within the EC2 service space or maybe ECS if you are dealing with a crypto mining operation. However, if they TA hits some permission snags or crypto mining is not the goal, you will likely be looking three additional services mentioned along with S3, RDS and/or more. Depends on your threat actor's motivation.

The beauty of CloudTrail logs is that it is pretty easy to find what you are looking for given the common format that AWS service (two-pizza) teams adhere to and being in a consolidates data source. Juxtaposed to operating systems and traditional on-prem resources, you would need to know all the different file locations for the log sources you need, ingesting them into a SIEM or manually pulling them off disk or maybe even automated, but in a retroactive manner. In AWS, you could theoretically rely on a single primary cloud data source—CloudTrail—for your EC2 instance investigations.

For more resources on logging, check out this great blog put together by Anna McAbee and the AWS security team detailing an in-depth logging strategy in AWS (covers more than EC2 and CloudTrail logging naturally).

Investigation Methodology

We started with high-level contrasts between EC2 and its on-premises equivalents, but let’s take a moment to address specific investigator tradecraft (this is not an all-inclusive list).

Similarities:

Investigating the W's (who/what/when/where*)

The primary similarity is the act of answering these types of questions.

Host-based artifacts

Yes, there are still artifacts that can be collected
Depending on the exploit/vector, usefulness may vary.

Searching for privilege escalation*

You may find host-based artifacts that can help tell a traditional story of privilege escalation for some incidents but by and large, this will be a primary difference in investigation tradecraft (see Differences below).

Network Traffic Analysis

Netflow is still useful and is not all that different.

Searching for data exfiltration*

Another similarity but different tradecraft in that you would need to look at more than just NetFlow data (as one example) to determine exfil.

*This is both similar and a stark contrast in the how & interpretation

Differences:

Investigation of the W's (who/what/when/where*)

In an AWS incident involving an EC2 instance, you could start with EDR logs (if you have one installed), but going straight to CloudTrail logs to understand control plane activities often does a really good job of answering the W's.

Interpretation

I will use the example of CreateUser control plane event. If you're using your "on-premise mindset", you may think you need to look at local logs to investigate further which is obviously not the correct data source. To be clear, you would want to stay in the CloudTrail logs until you need to pivot to another data source due to unanswered questions.
The "Who" in your investigation is abstractly, the threat actor but in investigative practice, you are more concerned with the compromised identities. In an EC2 compromise investigation (and really most AWS incidents I've observed), the "Who" that you are investigating at any given point could be:

The EC2 instance & its associated IAM Role

They are joined at the hip essentially due to the feature instance profiles

EC2 service or other AWS service(s)
Separate IAM Role/User
Anything else under or in the clouds (😉) depending on how your investigation is going

Host-Based Artifacts

How you obtain these should be utilizing AWS native APIs to move at cloud speed, as opposed to waiting for your EnCase, FTK, or whatever imaging software you use to grab a disk copy. Doing it the on-premises way is still possible for EC2 instances, but it’s incredibly slow and requires additional setup.
Grabbing triage forensic artifacts such as OS event logs will depend on your EDR or agent you want to use and yes, you could leverage a built in cloud service such as AWS SSM to pull these artifacts using scripts, so this would again be the "how" that is different.
My final thoughts on host-based artifacts however, is if you really feel like you need these, you should have a good reason and hopefully it is more than, "because legal requires them." This is an acceptable answer for regulatory reasons...but for you (the investigator), I hope you have better reasons. These artifacts may help answer a couple of investigative questions, but they’ll leave you high and dry for everything after that. Don’t say I didn’t warn you!

I Said What I Said Nene Leakes GIF from I Said What I Said GIFs

Privilege Escalation

This is also a stark difference as I covered with "The Crossover" above and it is something that can trip you up if you bring your on-prem assumptions to the cloud. Although you can spend time looking for privilege escalation via an exploit or software vulnerability, it may not be material to privilege escalation, assuming the compromised EC2 instance and its associated IAM Role did not already grant the attacker administrative level permissions (i.e. Overly Permissive). I referenced this above but the reality is, the attacker would only need to "live off the cloud" and use readily available APIs to escalate their privileges. Simply using AttachRolePolicy with a the AWS-managed AdministratorAccess policy (provided the compromise role has the permission to make this call) and poof, they have everything they need.

Network Log Analysis

Your primary data source here would be VPC flow logs (the one time IR or DE actually uses these...just kidding I use them). Assuming you didn't bring your firewall with you, you could use this data source to focus on your instance ID/ENI and could look for typical NetFlow type things: IP-to-IP traffic, Data Exfil by packet size, etc. What I want you to understand however, is that you should NOT assume that because you don't see large traffic volume/size, that no data exfiltration occurred. This would be a fatal mistake and your lawyers and leadership will be very angry at this oversight.

Data Exfiltration

The main difference here is that the threat actor doesn’t need to write anything to the EC2 disk volumes, nor do they need to go through the firewall your sysadmins brought from on-premises into AWS (I know you did—it’s okay… well, it may not be, but your secret’s safe with me). If the attacker had access to S3 from the compromised instance role (or created one that does), they could use AWS native services like S3 to copy your bucket data from its source directly to their S3 bucket at great speed. They could also create & share the snapshot to their attacker controlled account by modifying the snapshot's attributes. Again, the attacker would be operating at the speed these cloud environments provide—which is much quicker than you think. It’s all on the AWS backbone, and none of this traffic will go through your firewall—or even VPC flow logs. I know, I'm sorry, this last bullet might have struck your on-premise heart but I promise you will live.

Conclusion

If you made it this far, "you got a friend in me" 🤠. AWS Cloud investigations certainly overlap with on-premises investigations, but to leave it at that would be both disingenuous and extremely misleading. As you can see above, even in the parallels, the truth is in the details—and those details are both derived differently and can mean something else entirely in the cloud. In short, calling them similar isn’t wrong—just make sure you add a "but" before finishing that thought.

Stay tuned for the next deep dive into the AWS S3 service!

Forensic Refences Links:

AWS Security Blog - How to automate forensic disk collection in AWS

SANS Whitepaper - Digital Forensic Analysis of Amazon Linux EC2 Instances

AWS re:Inforce Slides - Instance memory acquisition techniques for effective incident response

AWS Docs - Automated Forensics Orchestrator for Amazon EC2 and EKS

Search This Blog

Le Bron Does Security?