CloudTrail evolution

At the end of 2013 (yes, 5 years ago already), AWS announced a new service, CloudTrail. It claimed to provide increased visibility into user activity for demonstrating compliance; it never claimed to be an audit log, but is as close as can be architected and not be in the critical path for API request execution.

From the start, CloudTrail supported immediate cross-account delivery of logs. These logs were thus untouched by the generating account — there was no user-deployed replication of data files between AWS Accounts, and thus no question of the user origin Account editing the CloudTrail logs before a separate security team got access.

You can see why this was a massive success. Oh, and the first trail per region was complementary, except for the traffic charges incurred to copy the logs from the originating Region to the destination, and for the actual storage of the logs after delivery (but the customer may chose a retention policy — see S3 Lifecycle Policies).

I wanted to demonstrate the impact of the design of CloudTrail has had in innovating over the last half a decade…

The Early Years

Initially CloudTrail would itself execute from an AWS Service team’s own AWS account — one per Region. When provisioning an S3 Bucket for receiving these logs, a customer would have to find the authoritative list of Account IDs to whitelist for S3:PutObject.

Each time a new AWS Region was being launched (eg, Stockholm), a new Account ID for CloudTrail’s service account would have to be discovered, and added to the destination S3 Bucket Policy.

Furthermore, CloudTrail was initially a per-Region service, so customers would have to scurry over every AWS account and define a CloudTrail trail in the newly launched Region; until this did this, the new region was effectively a blind spot for governance and compliance that read the logs.

So you can see in the above, with three accounts, and three Regions; we had to define CloudTrail 9 times! Lets look now at today’s 20 regions, and a customer with 20 AWS Accounts, and we’re having to defined CloudTrail 400 times. Luckily, there’s an API, and CloudFormation support…

The Middle Years: Per-Account simplification

These first two problems (Service Account IDs for the S3 Bucket Policy, and new-Region blind spot) were solved in time. I and many others contributed to the product feature request and Support Requests feedback to help shape this: the CloudTrail account identity was solved with IAM Service Principals, and as such, the service name “cloudtrail.amazon.com” now matches the CloudTrail service in every current and future (non-Chinese, non-US-GovCloud) commercial Region.

This bucket policy now looks like:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AWSCloudTrailAclCheck20150319",
            "Effect": "Allow",
            "Principal": {"Service": "cloudtrail.amazonaws.com"},
            "Action": "s3:GetBucketAcl",
            "Resource": "arn:aws:s3:::myBucketName"
        },
        {   "Sid": "AWSCloudTrailWrite20150319",
            "Effect": "Allow",
            "Principal": {"Service": "cloudtrail.amazonaws.com"},
            "Action": "s3:PutObject",
            "Resource": [ "arn:aws:s3:::myBucketName/AWSLogs/myAccountID1/*", "arn:aws:s3:::myBucketName/AWSLogs/myAccountID2/*", "arn:aws:s3:::myBucketName/AWSLogs/myAccountID3/*" ],
            "Condition": {"StringEquals": {"s3:x-amz-acl": "bucket-owner-full-control"}}
        }
     ]
}

Note: we still whitelist per accounts, so that other AWS customers couldn’t send their CloudTrail logs to our bucket — which would be weird, and potentially just incur storage charges to us.

And for the new-Region problem: Global Cloud Trail Trails were introduced, that would have a Home Region the Global Trail was defined in, but collect activity from ever other Region.Further improvements came in the way of cryptographically signed Digest files that would form a chain of files each of which contained information about the files that came before, providing an unbreakable chain of history such that the modification of a CloudTrail data file, or modification or removal of a previous digest file could be detected.

Now its a little more manageable; a customer with 20 AWS Accounts need only defined CloudTrail 20 times. With APIs and CloudFormation, there is a chance of consistency.

More and more services became capable of generating CloudTrail logs over the years, and the JSON logging format had a few modifications, which was beautifully handled by the version log entry format (currently has log entry versions up to 1.06).

And now with an Organisation approach

AWS Organisations is starting to build out more of a corporate approach to the decade long multi-account approach. A somewhat clunky Landing Zone service tried to make this a little more turn-key, but AWS Organisations is now starting to deliver on simplification.

With a Verified master account (which historically was your Consolidated Billing Account), you can now push a master CloudTrail definition to all subsidiary accounts. This is done once, and all subscribed account configure this trail. Furthermore, those subsidiary accounts cannot remove the enforced Organisation Tail while a part of the organisation.

Thus consistency is ensured, and a security team no longer has to scan these workload accounts to ensure that they are still logging CloudTrail, and logging it to the correct enterprise destination(s).

Thus combining this with cross-account logging, we end up with something looking a little like this:

Warning: Log Prefix has changed

However, the Prefix that CloudTrail logs to in this configuration has changed ever so slightly. Where previously we whitelisted individual accounts in the Resources part of our S3 bucket, we now have to whitelist a Prefix that includs our Organisation ID — but we don’t need to be worried about other (non-organisation-members) sending logs to us:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AWSCloudTrailAclCheck20150319",
            "Effect": "Allow",
            "Principal": {"Service": "cloudtrail.amazonaws.com"},
            "Action": "s3:GetBucketAcl",
            "Resource": "arn:aws:s3:::myBucketName"
        },
        {
            "Sid": "AWSCloudTrailWrite20150319",
            "Effect": "Allow",
            "Principal": {"Service": "cloudtrail.amazonaws.com"},
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::myBucketName/AWSLogs/o-1234567/*", 
            "Condition": {"StringEquals": {"s3:x-amz-acl": "bucket-owner-full-control"}}
        }
     ] 
}

What remains to be tested by my (at least) is creating a new account and seeing it get an Organisation Trail pushed to it upon create; adding an already-created account and seeing the same; and lastly removing an account from an organisation and seeing it not be able to log any more.

In Summary

Hats off to the IAM, CloudTrail, Organisations teams for making this all come together (as well as other service teams who have managed to get CloudTrail support into their products.

Customers will want to move to this, but three may be adjustment required for any analytics solutions reading the CloudTrail files due to the new location after implementing this change. Customers may chose to leave their existing CloudTrail in place for a few days after deploying an Organisation Trail in order to update those analytics services.

CloudTrail continues to be at the heart of Governance and Compliance assurance when running on AWS.

There’s much more to talk about here for the configuration of the destination S3 Bucket. If you would like to spend a deep dive with me, check out our in-person Advanced Security & Operations on AWS course. If you’d like this delivered in your city, please get in touch with us at Nephology.

Web Transitions and Compatibility

I have spoken previously of web protocol transitions that are currently happening: and the encryption layer, the HTTP layer (OSI Layer 7), and even the TCP layer. But I wanted to dive deeper on this, and speak about the benefits of starting the transitions, and the risks of not finishing them.

The IT industry is terrible at discarding the abandoned and obsolete technologies it once heralded. The Change Management and ITIL process, the traditional project management approaches that constricted velocity of change have given rise to the culture of not changing anything – avoiding the work of actually moving forward.

Technology Enabling new version Disabling previous version
Risk Advantage Risk Advantage
HTTP/2 None Faster, less bandwidth May exclude older browser, integrations None
TLS 1.3 Middleboxes (transparent proxies) poor TLS implementation (most have patches available) Faster (less Round Trips), More Secure (less supported older ciphers, some new ciphers) May exclude older browser, system integrations Reduce security risk
Security Headers Uncover poor implementation in your products! Client helps with security
Network (web client) logging Lots of network traffic, turns requests (read operations) to events (write operations) Discover issues affecting clients you didn’t cater for

What I typically see is operations teams that leave every legacy protocols and cipher enabled, no headers inserted, no modern ciphers or protocols.

Today I’ll take you down one of these rabbit holes: TLS Protocols, CIphers, and Message Authentication (MAC).

Protocol Transitions and backwards compatibility

The TLS conversation between a web browser and the server starts with a selection of the newest protocol that both support. At this point in time (Dec 2018), there are 7 versions: SSLv1 which was never used in the wild; SSLv2, SSLv3, TLSv1 all of which are now deprecated by PCI DSS 3.2 and should not be used; TLSv1.1 which was only the “latest” version for around 18 months a decade ago – a period when only one new browser appeared – Safari – which has had many newer versions since then that support newer protocols; TLSv1.2; and the very new TLSv1.3.

The first step to a transition for web service operators is to ensure TLSv1.2, and if available, TLSv1.3 are enabled.

Don’t panic if you can’t enable TLSv1.3 right now, but keep patching and updating your OS, Web Server, Load Balancers, etc, and eventually it will become available.

Your next stop is to examine your logs and see if you can determine the Protocol and Cipher being used. Standard “combined” Log file formats don’t record this, but you can add it in. For example, Apache defines one of its log formats as:

LogFormat "%h %l %u %t \"%r\" %>s %O" common

We can adjust this with the addiitonal detail:

LogFormat "%h %l %u %t \"%r\" %>s %O %{SSL_PROTOCOL}x %{SSL_CIPHER}x" commontls

And now any of our sites can be modified from common to commontls and we can see the Protocol and Cipher used. At this point, sit back for a week, and then review what Protocols were actually seen over that period. Doing a combination of cut, sort, or some perl:

cat access.log.1 | perl -ne '/\s(\S+)\s(\S+)$/ && $h{$1}++; } { foreach $val ( keys %h) { print "$val = $h{$val}\n" }'

You’ll end up with something like:

TLSv1.3 = 468
- = 16
TLSv1.2 = 28188

So we see only modern connections here (we can ignore the 16 non-matching lines).

Of course, you may also see our older protocols mentioned, so your question should be what to do now. if these all originate from a few regular IP addresses, then you possibly have an old client, or an old script/integration process, for example, running from Python 2.x, old Perl, Java 6, etc.

If that’s the case, then you have a conundrum; those old integration processes will prevent you from securing those integrations!  In order to maintain a secure connection, the client/integration will need an update. Move to Java 8 (or 11), Python 3.6 (or 3.7), etc. If that’s an external service provider or 3rd party, then it’s out of your hands. If you’re a paying customer, then it’s time to request that your provider updates their environment accordingly. A key phrase I love to bandy about is:

We've upped our standards; up yours.

Of course, you can always just disable those older protocols (perhaps after some notice if its an important integration). Nothing gets work moving quite like a deadline. “We’re turning off TLSv1 on 1 April – no joke; TLS 1.2 is our new minimum”.

If you’re setting up a new service today, I would strongly suggest only enabling TLS 1.2 and 1.3 from the start.; and over the coming years, make a conscious plan to schedule the deprecation of TLS 1.2.

I you only have one (or two) Protocols enabled, then as part of your operational responsibility, you only have to worry about one (or two) protocols being compromised.

Ciphers

Some providers enable almost every cipher under the sun. I have no idea how they keep aware of the vulnerabilities in all those ciphers. I prefer to minimise this down to the smallest, strongest set I can offer. Today, that’s AES in GCM mode (either AES 128 or AES 256). AES in CBC mode is deprecated (but its the strongest that MS IE supports on many Windows versions). Microsoft announced in 2018 that CBC is no longer considered secure. So your choice is to support MS IE (on older platforms), or be secure. Do your developers a favour, and drop MS IE compatibility.

New ciphers such as CHACHA20 are only available under TLS 1.3 are fine as well. But all the older ones, such as RC4, DES and 3DES, should be gone. As above, check your logs after you have sufficient logging enabled to determine if these are actively being used.

Key Exchanges

When keys are exchanged, these should always be done using ephemeral (temporary) keys. Your Cipher Suite soup should have DHE for Diffie-Helman Ephemeral, or ECDHE for Elliptical Curve Diffie-Helman Ephemeral key exchange. Anything with plain DH is using the same keys repeatedly, and should be disabled. Again look at your now-enhanced logs and determine if you can disable older Key Exchange algorithms.

Message Authentication and Check-summing

The last part of a cipher suite often has something akin to a message digest or checksum function, such as SHA, MD5, etc. The only ones that should be available today are SHA256, SHA384, and SHA512.  The larger the number, the more unique the checksum is, but the higher the computational costs.

Over time, newer checksums will come, but most major browsers don’t support higher than SHA512 at this time.

In Conclusion

Migrating the TLS Cipher Suite and Protocol are probably two of the most security-critical pieces that needs professional tuning to avoid being accidentally configured in a way that can be compromised. The  standard approach of enabling everything, or accepting vendor defaults, is a dangerous approach.

If you’re not confident with this, then read more, or join Nephology during one of our Web Security training courses for face-to-face help.