Tips From the Trenches: Cloud Custodian–Automating AWS Security, Cost and Compliance Code42 Blog

Tips From the Trenches: Cloud Custodian–Automating AWS Security, Cost and Compliance

“We’re moving to the cloud.” If you haven’t heard this already, it’s likely you will soon. Moving to the public cloud poses many challenges upfront for businesses today. Primary problems that come to the forefront are security, cost and compliance. Where do businesses even start? How many tools do they need to purchase to fulfill these needs?

After deciding to jump start our own cloud journey, we spun up our first account in AWS and it was immediately apparent that traditional security controls weren’t going to necessarily adapt. Trying to lift and shift firewalls, threat vulnerability management solutions, etc. ran into a multitude of issues including but not limited to networking, AWS IAM roles and permissions and tool integrations. It was clear that tools built for on-premise deployments were no longer cost or technologically effective in AWS and a new solution was needed.

“ It was clear that tools built for on-premise deployments were no longer cost or technologically effective in AWS and a new solution was needed. ”

To remedy these discoveries, we decided to move to a multi-account strategy and automate our resource controls to support increasing consumption and account growth. Our answer to this was Capital One’s Cloud Custodian open source tool because it helps us manage our AWS environments by ensuring the following business needs are met:

  • Compliance with security policies
  • AWS tagging requirements
  • Identifying unused resources for removal/review
  • Off-hours are enforced to maximize cost reduction
  • Encryption needs are enforced
  • AWS Security Groups are not over permissive
  • And many more…

After identifying a tool that could automate our required controls in multiple accounts, it was time to implement the tool. The rest of this blog will focus on how Cloud Custodian works, how Code42 uses the tool, what kind of policies (with examples) Code42 implemented and resources to help one get started in implementing Cloud Custodian into their own environment.

How Code42 uses Cloud Custodian

Cloud Custodian is an open source tool created by Capital One. You can use it to automatically manage and monitor public cloud resources as defined by user written policies. Cloud Custodian works in AWS, Google Cloud Platform and Azure. We, of course, use it in AWS.

As a flexible “rules engine,” Cloud Custodian allowed us to define rules and remediation efforts into one policy. Cloud Custodian utilizes policies to target cloud resources with specified actions on a scheduled cadence. These policies are written in a simple YAML configuration file that specifies a resource type, resource filters and actions to be taken on specified targets. Once a policy is written, Cloud Custodian can interpret the policy file and deploy it as a Lambda function into an AWS account. Each policy gets its own Lambda function that enforces the user-defined rules on a user-defined cadence. At the time of this writing, Cloud Custodian supports 109 resources, 524 unique actions and 376 unique filters.

As opposed to writing and combining multiple custom scripts that make AWS API calls, retrieving responses, and then executing further actions from the results, the Cloud Custodian simply interprets an easy-to-write policy that then takes into consideration the resources, filters and actions and translates them into the appropriate AWS API calls. These simplifications make this type of work easy and achievable for even non-developers.

“ As a flexible rules engine, Cloud Custodian allowed us to define rules and remediation efforts into one policy. Cloud Custodian utilizes policies to target cloud resources with specified actions on a scheduled cadence. ”

Now that we understand the basic concepts of Cloud Custodian, let’s cover the general implementation. Cloud Custodian policies are written and validated locally. These policies are then deployed by either running Cloud Custodian locally and authenticating to AWS or in our case via CI/CD pipelines. At Code42, we deploy a baseline set of policies to every AWS account as part of the bootstrapping process and then add/remove policies as needed for specific environments. In addition to account specific policies, there are scenarios where a team may need an exemption, as such, we typically allow an “opt-out” tag for some policies. Code42 has policy violations report to a Slack channel via webhook created for each AWS account. In addition, we also distribute the resources.json logs directly into a SIEM for more robust handling/alerting.

Broadly speaking, Code42 has categorized policies into two types – (i) notify only and (ii) action and notify. Notify policies are more hygiene-related and include policies like tag compliance checks, multi-factor authentication checks and more. Action and notify policies are policies that take actions after meeting certain conditions, unless tagged for exemptions. Action and notify policies include policies like s3-global-grants, ec2-off-hours-enforcement and more.  The output from the custodian policies are also ingested into a SIEM solution to provide more robust visualization and alerting. This allows the individual account owners to review policy violations and perform the assign remediation actions to their teams. For Code42, these dashboards provide both the security team and account owners the overall health of our security controls and account hygiene. Examples of Code42 policies may be found at GitHub.

What policies did we implement?

There are three primary policy types Code42 deployed; cost-savings, hygiene and security. Since policies can take actions on resources, we learned that it is imperative that the team implementing the policies must collaborate closely with any teams affected by said policies in order to ensure all stakeholders know how to find and react to alerts and can provide proper feedback and adjustments when necessary. Good collaboration with your stakeholders will ultimately drive the level of success you achieve with this tool. Let’s hit on a few specific policies.

Cost Savings Policy – ec2-off-hours-enforcement

EC2 instances are one of AWS’s most commonly used services. EC2 allows a user to deploy cloud compute resources on-demand as necessary, however there are many cases where the compute gets left “on” even when it’s not used, which racks up costs. With Cloud Custodian we’ve allowed teams to define “off-hours” for their compute resources. For example, if I have a machine that only needs to be online 2 hours a day, I can automate the start and stop of that instance on a schedule. This saves 22 hours of compute time per day. As AWS usage increases and expands, these cost savings add up exponentially.

Hygiene Policy – ec2-tag-enforcement

AWS resource tagging is highly recommended in any environment. Tagging allows you to define multiple keys with values on resources that can be used for sorting, tracking, accountability, etc. At Code42, we require a pre-defined set of tags on every resource that supports tagging in every account. Manually enforcing this would be nearly impossible. As such, we utilized a custodian policy to enforce our tagging requirements across the board. This policy performs a series of actions as actions described below.

  1. The policy applies filters to look for all EC2 resources missing the required tags.
  2. When a violation is found, the policy adds a new tag to the resource “marking” it as a violation.
  3. The policy notifies account owners of the violation and that the violating instance will be stopped and terminated after a set time if it is not fixed.

If Cloud Custodian finds tags have been added within 24 hours, it will remove the tag “violation.” If the proper tags are not added after, the policy continues to notify account owners that their instance will be terminated. If not fixed within the specified time period, the instance will terminate and a final notification is sent.

This policy ultimately ensures we have tags that distinguish things like a resource “owner.” An owner tag allows us to identify which team owns a resource and where the deployment code for that resource might exist. With this information, we can drastically reduce investigation/remediation times for misconfigurations or for troubleshooting live issues.

Security Policy – S3-delete-unencrypted-on-creation

At Code42, we require that all S3 buckets have either KMS or AES-256 encryption enabled. It is important to remember that we have an “opt-out” capability built into these policies so they can be bypassed when necessary and after approval. The bypass is done via a tag that is easy for us to search for and review to ensure bucket scope and drift are managed appropriately.

This policy is relatively straightforward. If the policy sees a “CreateBucket” Cloudtrail event, it checks the bucket for encryption. If no encryption is enabled and an appropriate bypass tag is not found, then the policy will delete the bucket immediately and notify the account owners. It’s likely by this point you’ve heard of a data leak due to a misconfigured S3 bucket.  It can be nearly impossible to manually manage a large scale S3 deployment or buckets created by shadow IT. This policy helps account owners learn good security hygiene, and at the same time it ensures our security controls are met automatically without having to search through accounts and buckets by hand. Ultimately, this helps verify that S3 misconfigurations don’t lead to unexpected data leaks.

Just starting out?

Hopefully this blog helped highlight the power of Capital One’s Cloud Custodian and its automation capabilities. The Cloud Custodian policies can be easily learned and written by non-developers, and provides needed security capabilities. Check out the links in the “Resources” section below regarding Capital One’s documentation, as well as examples of some of Code42’s baseline policies that get deployed into every AWS account during our bootstrap process. Note: these policies should be tuned accordingly to your business and environment needs and not all will be applicable to you.

Resources:

Authors:

Aakif Shaikh, CISSP, CISA, CEH, CHFI is a senior security analyst at Code42. His responsibilities include cloud security, security consulting, penetration testing and inside threat management. Aakif brings 12+ years of experience into a wide variety of technical domains within information security including information assurance, compliance and risk management. Connect with Aakif Shaikh on LinkedIn.

Byron Enos Code42

Byron Enos is a senior security engineer at Code42, focused on cloud security and DevSecOps. Byron has spent the last four years helping develop secure solutions for multiple public and private clouds. Connect with Byron Enos on LinkedIn.

Code42 Jim Razmus

Jim Razmus II is director of cloud architecture at Code42. He tames complexity, seeks simplicity and designs elegantly. Connect with Jim Razmus II on LinkedIn.

Tips From the Trenches: Automating Change Management for DevOps

One of the core beliefs of our security team at Code42 is SIMPLICITY. All too often, we make security too complex, often because there are no easy answers or the answers are very nuanced. But complexity also makes it really easy for users to find work-arounds or ignore good practices altogether. So, we champion simplicity whenever possible and make it a basic premise of all the security programs we build.

“ At Code42, we champion simplicity whenever possible and make it a basic premise of all the security programs we build. ”

Change management is a great example of this. Most people hear change management and groan. At Code42, we’ve made great efforts to build a program that is nimble, flexible and effective. The tenants we’ve defined that drive our program are to:

  • PREVENT issues (collusion, duplicate changes)
  • CONFIRM changes are authorized changes
  • DETECT issues (customer support, incident investigation)
  • COMPLY with regulatory requirements

Notice compliance is there, but last on the list. While we do not negate the importance of compliance in the conversations around change management or any other security program, we avoid at all costs using the justification of “because compliance” for anything we do.

Based on these tenants, we focus our efforts on high impact changes that have the potential to impact our customers (both external and internal). We set risk-based maintenance windows that balance potential customer impact with the need to move efficiently.

We gather with representatives from both the departments making changes (think IT, operations, R&D, security) and those impacted by changes (support, sales, IX, UX) at our weekly Change Advisory Board meeting–one of the best attended and most efficient meetings of the week–to review, discuss and make sure teams are appropriately informed of what changes are happening and how they might be impacted.

This approach has been working really well. Well enough, in fact, for our Research Development & Operations (RDO) team to embrace DevOps in earnest.

New products and services were being deployed through automated pipelines instead of through our traditional release schedule. Instead of bundling lots of small changes into a product release, developers were now looking to create, test and deploy features individually–and autonomously. This was awesome! But also, our change management program–even in its simplicity–was not going to cut it.

“ We needed to not make change control a blocker in an otherwise automated process. We looked at our current pipeline tooling to manage approvers and created integrations with our ticketing system to automatically create tickets to give us visibility to the work being done. ”

So with the four tenants we used to build our main program, we set off to evolve change management for our automated deployments. Thankfully, because all the impacted teams have seen the value of our change management program to-date, they were on board and instrumental in evolving the program.

But an additional tenant had to be considered for the pipeline changes. We needed to not make change control a blocker in an otherwise automated process. So we looked at our current pipeline tooling to manage approvers and created integrations with our ticketing system to automatically create tickets to give us visibility to the work being done. We defined levels of risk tied to the deployments and set approvers and release windows based on risk. This serves as both a control to minimize potential impact to customers but also as a challenge to developers to push code that is as resilient and low impact as possible so they can deploy at will.

We still have work to do. Today we are tracking when changes are deployed manually. In our near future state our pipeline tooling will serve as a gate and hold higher risk deployments to be released in maintenance windows. Additionally, we want to focus on risk, so we are building in commit hooks with required approvers based on risk rating. And, again, because we worked closely with the impacted teams to build a program that fit their goals (and because our existing program had proven its value to the entire organization), the new process is working well.

Most importantly, evolving our change process for our automated workflows allows us to continue to best serve our customers by iterating faster and getting features and fixes to the market faster.

Connect with Michelle Killian on LinkedIn.

Tips-From-the-Trenches-Security-Design_Code42_Blog

Tips From the Trenches: Thinking About Security Design

Part of the success criteria for any security program is to ensure the process, control or technology utilized has some additional benefit aside from just making things “more secure.” Most controls we impose to make ourselves safer often come at the expense of convenience. But what if we took a different approach when thinking about them? A mentor of mine often starts a security design discussion by asking us to consider the following:

Why do cars have brakes?

Naturally, my first thought is that brakes allow the driver to slow or stop when going too fast. After all, a car with no brakes is dangerous, if not completely useless. However, when we consider that the braking system in the car enables the driver to go as fast as they want, the purpose of this control takes on a new meaning.

Changing perceptions about the controls we impose on security design within Information Security doesn’t come easy. Even some of the most seasoned infosec professionals will insist a particular control be in place without considering how the control impacts workflow, or worse, the bottom line.

“ As security professionals, we need to design controls that empower our business in the safest way possible, without getting in the way of where we’re trying to go. ”

Aligning controls and risks

Some of the most impactful security controls are the ones we don’t even realize are there. When designed correctly, they mitigate risk while providing a benefit to the user. The proliferation of biometric security is a great example of this. My mobile phone and computer offer the ability for me to access the device by simply touching or staring at it. Because I am much more focused on how convenient and easy it is to unlock my phone to look at cat pictures, I forget that these controls were designed as a security measure.

As a security professional, I do, however, need some assurance that the controls can’t be easily circumvented. For example, a quick search for exploits of fingerprint or face-recognition systems will show that they can be easily fooled with a 3D printer, some Play-Doh and a little time. However, when enhanced with an additional factor like a password or PIN, the authentication mechanism evolves to something much more difficult to compromise while being considerably easier for me to remember than a 16-character password that I have to change every ninety days.

In Information Security, this is why it’s important for us to consider how we design solutions for our environment. If all I’m protecting is access to cat pictures, is my face or fingerprint unlock enough? I’d say so. But for my Database Administrator (DBA) or Identity and Access Management (IAM) administrator to protect my company’s crown jewels? Definitely not.

Creating controls with a purpose

And this is what I think brings us to the crux of security design: as an end-user, if I don’t know why the control is there, I won’t use it or I might even try and go around it. Moreover, if I have no idea that it’s there, it better work without getting in my way.

Let’s return to the car example. My daughter just finished the process of getting her driver’s license. In doing so, just like her old man, she was subject to videos depicting the horrors of car accidents and negligent driving. Way back in my day, the message was clear: driving death was thwarted by seatbelts and the ten-and-two. For her, it’s not texting and driving and the eight-and-four. I have absolutely no idea how a seatbelt can help me avoid an accident, but I’m crystal clear why I need one, should it happen. If I ask her about texting-and-driving, she’ll be equally clear that it’s possible to kill someone while doing it.

Getting back to the topic of security design, if I don’t understand why I need the control, it’s better that I have no awareness it’s around. Just like an airbag, I need to trust it’s there to protect me. On the flip side, I definitely need to know the importance of buckling up or putting my phone in the glovebox so I can keep my eyes on the road.

Transparent security

And this is what excites me about what we’re building at Code42 with our Code42 Next-Gen Data Loss Protection solution. Transparent security.

In the traditional Data Loss Prevention (DLP) space, transparent security is not an easy task. More often than not, people just trying to do their jobs end up getting blocked by a one-size-fits-all policy. Our application, on the other hand, enables security administrators to come together in a way that gives the business what they want: protection for their best ideas without Security getting in the way.

Computers, just like cars, can be dangerous and yet, each of us can’t imagine a life without them. Their utility demands they be safe and productive. As security professionals, we need to design controls that empower our business in the safest way possible, without getting in the way of where we’re trying to go.

Tips From the Trenches: Using Identity and Access Management to Increase Organizational Efficiencies and Reduce Risk

Tips From the Trenches: Using Identity and Access Management to Increase Efficiencies and Reduce Risk

As a security company, it’s imperative that we uphold high standards in every aspect of our security program. One of the most important and foundational of these areas is our Identity and Access Management (IAM) program. As part of Code42’s approach to this program, we have identified guiding principles that have a strong focus on automation. Below is an outline of our journey.

IAM guiding principles

Every IAM program should have guiding principles with HR, IT and security. Here are a few of ours:

1. HR would become the source of truth (SoT) for all identity lifecycle events, ranging from provisioning to de-provisioning.

The initial focus was to automate the provisioning and de-provisioning process, then address the more complex transfer scenario at a later phase. HR would trigger account provisioning when an employee or contractor was brought onboard, and shut off access as workers left the company. Further, the HR system would become authoritative for the majority of identity related attributes for our employees and contractors. This allowed us to automatically flow updates made to an individual’s HR record (e.g. changes in a job title or manager) to downstream connected systems that previously required a Help Desk ticket and manual updates.

2. Our objectives would not be met without data accuracy and integrity.

In-scope identity stores such as Active Directory (AD) and the physical access badge system had to be cleansed of legacy (stale) and duplicate user accounts before they were allowed to be onboarded into the new identity management process. Any user account that could not be matched or reconciled to a record in the SoT system was remediated. Although a rather laborious exercise, this was unquestionably worth it in order to maintain data accuracy.

3. Integrate with existing identity infrastructure wherever possible.

We used AD as our centralized Enterprise Directory, which continues to function as the bridge between on-prem and cloud identity broker, Okta. Integrating with AD was of crucial importance as this would allow us to centrally manage access to both on-premise and cloud based applications. When a worker leaves the company, all we need to do is ensure the user account is disabled in AD, which in turn disables the person’s access in Okta.

Once we had agreement on our guiding principles, it was time to start the design and implementation phase. We built our solution using Microsoft’s Identity Manager (MIM) because our IAM team had used Microsoft’s provisioning and synchronization engine in the past and found it to be easy to configure with many built-in connectors and extendable via .NET.  

IAM implementation phases

Identity in every organization is managed through a lifecycle. Below are two of the identity phases we have worked through and the solutions we built for our organization:

1. Automating provisioning and deprovisioning is key, but can also cause challenges.

One challenge we had was a lag between a new employee starting and employee records being populated in systems that act as the source of truth. This doesn’t allow lead time to provision a user account and grant access for the incoming worker. We solved this obstacle by creating an intermediate “SoT identity” database that mirrors the data we receive from our HR system. From there, we were able to write a simple script that ties to our service desk and creates the necessary database entry.

The next challenge was to automate the termination scenario. Similar to most companies, our HR systems maintain the user record long past an employee’s departure date for compliance and other reasons. Despite this, we needed a way to decommission the user immediately at time of departure. For this, we developed a simple Web Portal that allows our Helpdesk and HR partners to trigger termination. Once a user is flagged for termination in the Portal, the user’s access is automatically disabled by the identity management system. De-provisioning misses are a thing of the past!

2. Re-design and improve the access review process.

This phase aims to replace our current manual, spreadsheet-based, quarterly access certification process with a streamlined process using the built-in review engine in the identity management tool.

Implementing IAM at Code42 has been an awesome experience; and with the impending launch of the request portal, this year will be even more exciting! No matter how far along you are in your IAM implementation journey, I hope the concepts shared here help you along the way.

Code42-Tips-From-the-Trenches-Red-Teams-and-Blue-Teams

Tips From the Trenches: Red Teams and Blue Teams

In my most recent post, I wrote about the important role proactive threat hunting plays in a mature security program. Equally important to a well-designed program and closely related to hunting for threats is having a robust red team testing plan. Having a creative and dynamic red team in place helps to “sharpen the knife” and ensure that your security tools are correctly configured to do what they are supposed to do — which is to detect malicious activity before it has advanced too far in your environment.

“ It is much more challenging to build and maintain defensible systems than infiltrate them. This is one of the reasons why red team exercises are so important. ”

Red teams and blue teams

A red team’s mandate can range from assessing the security of either an application, an IT infrastructure or even a physical environment. For this post, I am referring specifically to general infrastructure testing, where the goal is to gain access to sensitive data by (almost) any means necessary, evaluate how far an attacker can go, and determine whether your security tools can detect or protect against the malicious actions. The red team attackers will approach the environment as if they are an outside attacker.

While your red team assumes the role of the attacker, your blue team acts the defender. It’s the blue team that deploys and manages the enterprise’s defenses. While the red team performs their “attack” exercises, there are many things your blue team can learn about the effectiveness of your company’s defenses — where the shortfalls are and where the most important changes need to be made.

Defining success

Before conducting a red team test, it helps to decide on a few definitions:

1. Define your targets: Without specifying what the critical assets are in your environment — and therefore what actual data an actual attacker would try to steal — your testing efforts will not be as valuable as they could be. Time and resources are always limited, so make sure your red team attempts to gain access to the most valuable data in your organization. This will provide you the greatest insights and biggest benefits when it comes to increasing defensive capabilities.

2. Define the scope: Along with identifying the data targets, it is essential to define the scope of the test. Are production systems fair game or will testing only be done against non-production systems? Is the social engineering of employees allowed? Are real-world malware, rootkits or remote access trojans permitted? Clearly specifying the scope is always important so that there aren’t misunderstandings later on.

How tightly you scope the exercise includes tradeoffs. Looser restrictions make for a more realistic test. No attacker will play by rules. They will try to breach your data using any means necessary. However, opening up production systems to the red team exercise could interrupt key business processes. Every organization has a different risk tolerance for these tests. I believe that the more realistic the red team test is, the more valuable the findings will be for your company.

Once you define your scope, make sure the appropriate stakeholders are notified, but not everybody! Telegraphing the test ahead of time won’t lead to realistic results.

3. Define the rules of engagement: With the scope of the test and data targets well defined, both the red team and the blue team should have a clear understanding of the rules for the exercise. For example, if production systems are in scope, should the defenders treat alarms differently if they positively identify an activity as part of the test? What are the criteria for containment, isolation and remediation for red team actions? As with scope, the more realistic you can make the rules, the more accurate the test will be, but at the potential cost of increased business interruption.

Making final preparations

Don’t end the test too quickly. A real attacker who targets your organization may spend weeks or even months performing reconnaissance, testing your systems and gathering information about your environment before they strike. A one-day red team engagement won’t be able to replicate such a determined attacker. Giving the red team the time and resources to mount a realistic attack will make for more meaningful results.

It’s also important to precisely define what success means. Often a red team attacker will gain access to targeted resources. This should not be seen as a failure on the part of the blue team. Instead, success should be defined as the red team identifying gaps and areas where the organization can improve security defenses and response processes — ultimately removing unneeded access to systems that attackers could abuse. A test that ends too early because the attacker was “caught,” doesn’t provide much in the way of meaningful insights into your security posture. An excellent red team test is a test that is comprehensive.

It’s important to note that defenders have the harder job, as the countless daily news stories about breaches illustrate. It is much more challenging to build and maintain defensible systems than infiltrate them. This is one of the reasons why red team exercises are so important.

Completing the test

Once the test is complete, the red team should share the strategies they used to compromise systems, and gain access or evade detection with the blue team. Of course, the red team should be documenting all of this during the test. Armed with this information, the blue team can determine how to harden the environment and create a bigger challenge for the red team during the next exercise.

We have a fantastic red team here at Code42. The team has conducted multiple tests of our infrastructure, and we have always found the results to be incredibly valuable. Any organization, no matter the size, can gain much more than they risk by performing red team testing.

As always, happy threat hunting!

Code42 Tips From the Trenches- Threat-Hunting Weapons

Tips From the Trenches: Threat-Hunting Weapons

When it comes to cybersecurity, too many enterprises remain on a reactive footing. This ends up being a drag on their efforts because, rather than getting ahead of the threats that target their systems, they spend too much of their time reacting to security alerts and incidents within their environments.

While being able to react to attacks quickly is important for any security team, it’s also important to get out in front of potential risks to identify threats lurking within your systems before they become active.

In this post, we’ll explain how threat hunting within one’s environment can help to break that reactive cycle and improve the effectiveness of any security program.

“ You don’t need a large security organization or any special security tools to start to proactively threat hunt; any security team can start threat hunting, and often using the tools they already have. ”

Threat hunting defined

Before going forward, let’s first take a step back and define what we mean by threat hunting. Essentially, threat hunting is the proactive search for evidence of undetected malicious activity or compromise. These threats can include anything from remote-access tools beaconing to an attacker’s command and control server to malicious actions of an employee or other trusted insider.

Threat hunting is essential for effective security for many reasons. First, defensive security technologies such as intrusion detection/prevention systems and anti-malware software will never successfully identify and block all malware or attacks. Some things are just going to get through. Second, by finding malware and threats that made it past your defenses, you’ll be able to more effectively secure your systems and make your environment much harder for attackers to exploit. Finally, getting adept at finding threats in your environment will improve your organization’s overall ability to respond to threats and, as a result, over time dramatically improve your security posture.

Your arsenal

Because threat hunting entails looking for things that have yet to trigger alerts — if they ever would trigger alerts, to begin with — it is important to look deeper for evidence of compromise. Fortunately, you don’t need a large security organization or any special security tools to start to proactively threat hunt; any security team can start threat hunting, and often using the tools they already have.

For instance, many of the data sources used in threat hunting will be found in firewall, proxy and endpoint logs. While these sources of data probably aren’t alerting on anything malicious, they still hold a considerable amount of security data that can point to potential indicators that an environment has been breached under their radar.

Other readily available tools are helpful for threat analysis, such as Bro (https://www.bro.org/), RITA (https://github.com/activecm/rita), or OSQuery (https://osquery.io/). These tools will help provide additional visibility into network and endpoint data that could provide insights into potential compromise. With these tools, teams can monitor internal network activity, such as virus outbreaks and lateral movements of data. Monitoring East-West network traffic in addition to what is moving through the firewall provides critical insights to the overall health of your network.

The investigation capabilities of Code42 Next-Gen Data Loss Protection (DLP) can be extremely helpful for threat hunting, for determining how widespread a file is distributed in the environment, and to give information about file lifecycle, all of which provide context around whether a file is business-related or suspicious. For example, with Code42 Next-Gen DLP, you can search by MD5 hash or SHA-256 to find all instances of a sensitive file in your organization, or determine if known malware has been detected in your organization.

New tools and new ways of thinking may seem overwhelming at first. However, threat hunting doesn’t have to be all-consuming. You can start with committing a modest amount of time to the hunt, and incrementally build your threat hunting capability over weeks and months to find malicious files and unusual activity. Also, as a direct benefit to your security program you will be able to eliminate noise in your environment, better tune your security tools, find areas of vulnerability and harden those areas, and enhance your security posture at your own pace.

Now, get hunting.

Tips-From-the-Trenches--Enhancing-Phishing-Response-Investigations

Tips From the Trenches: Enhancing Phishing Response Investigations

In an earlier blog post, I explained how the Code42 security team is using security orchestration, automation and response (SOAR) tools to make our team more efficient. Today, I’d like to dive a little deeper and give you an example of how we’re combining a SOAR tool with the Code42 Forensic File Search API — part of the Code42 Next-Gen Data Loss Protection (DLP) product —  to streamline phishing response investigations.

A typical phishing response playbook — with a boost

Below is a screenshot of a relatively simple phishing response playbook that we created using Phantom (a SOAR tool) and the Code42 Forensic File Search API:

We based this playbook on a phishing template built into the Phantom solution. It includes many of the actions that would normally be applied as a response to a suspicious email — actions that investigate and geolocate IP addresses, and conduct reputation searches for IPs and domains. We added a couple of helper actions (“deproofpoint url” and “domain reputation”) to normalize URLs and assist with case management.

You may have noticed one unusual action. We added “hunt file” via the Code42 Forensic File Search API. If a suspicious email has an attachment, this action will search our entire environment by file hash for other copies of that attachment.

“ Combining the speed of Code42 Next-Gen DLP with the automation of SOAR tools can cut response times significantly. ”

What Code42 Next-Gen DLP can tell us

Applying Code42 Next-Gen DLP to our playbook shortens investigation time. The “hunt file” action allows us to quickly see if there are multiple copies of a malicious file in our environment. If that proves to be true, it is quick evidence that there may be a widespread email campaign against our users. On the other hand, the search may show that the file has a long internal history in file locations and on endpoints. This history would suggest that the file exists as part of normal operating procedure and that we may be dealing with a false alarm. Either way, together the Code42 Next-Gen DLP API and its investigation capability give us additional file context so our security team can make smarter, and more informed and confident decisions about what to do next.

Applying Code42 Next-Gen DLP to other threat investigations

This type of “hunt file” action does not need to be limited to investigating suspected phishing emails. In fact, it could be applied to any security event that involves a file — such as an anti-virus alert, an EDR alert or even IDS/IPS alerts that trigger on file events. Using Code42 Next-Gen DLP, security staff can determine in seconds where else that file exists in the environment and if any further action is necessary.

Combining the speed of Code42 Next-Gen DLP with the automation of SOAR tools can cut response times significantly. That’s something any security team can appreciate.

As always, happy threat hunting!

Architecting-IAM-for-AWS-with-Okta-Code42_Blog

Tips From the Trenches: Architecting IAM for AWS with Okta

In the last year, Code42 made the decision to more fully embrace a cloud-based strategy for our operations. We found that working with a cloud services provider opens up a world of possibilities that we could leverage to grow and enhance our product offerings. This is the story of how we implemented identity and access management (IAM) in our Amazon Web Services (AWS) environment.

IAM guiding principles

Once the decision was made to move forward with AWS, our developers were hungry to start testing the newly available services. Before they could start, we needed two things: an AWS account and a mechanism for them to log in. Standing up an account was easy. Implementing an IAM solution that met all of our security needs was more challenging. We were given the directive to architect and implement a solution that met the requirements of our developers, operations team and security team.

We started by agreeing on three guiding principles as we thought through our options:

1.) Production cloud access/administration credentials need to be separate from day-to-day user credentials. This was a requirement from our security team that aligns with existing production access patterns. Leveraging a separate user account (including two-factor authentication) for production access decreases the likelihood of the account being phished or hijacked. This secondary user account wouldn’t be used to access websites, email or for any other day-to-day activity. This wouldn’t make credential harvesting impossible, but it would reduce the likelihood of an attacker easily gaining production access by targeting a user. The attacker would need to adopt advanced compromise and recon methods, which would provide our security analysts extra time to detect the attack.

2.) There will be no local AWS users besides the enforced root user, who will have two-factor authentication; all users will come through Okta. Local AWS users have some significant limitations and become unwieldy as a company grows beyond a few small accounts. We were expecting to have dozens, if not hundreds, of unique accounts.  This could lead to our developers having a unique user in each of the AWS environments. These user accounts would each have their own password and two-factor authentication. In addition to a poor end-user experience, identity lifecycle management would become a daunting and manual task. Imagine logging into more than 100 AWS environments to check if a departing team member has an account. Even if we automated the process, it would still be a major headache.

Our other option was to provide developers with one local AWS user with permissions to assume role in the different accounts. This would be difficult to manage in its own way as we tried to map which users could connect to which accounts and with which permissions. Instead of being a lifecycle challenge for the IAM team, it would become a permissioning and access challenge.

Fortunately for us, Code42 has fully embraced Okta as our cloud identity broker. Employees are comfortable using the Okta portal and all users are required to enroll in two-factor authentication. We leverage Active Directory (AD) as a source of truth for Okta, which helps simplify user and permission management. By connecting Okta to each of our AWS accounts, users can leverage the same AD credentials across all AWS accounts — and we don’t need to make any changes to the existing IAM lifecycle process. Permissioning is still a challenge, but it can be managed centrally with our existing IAM systems. I will describe in greater detail exactly how we achieved this later in the post.

3.) Developers will have the flexibility to create their own service roles, but will be required to apply a “deny” policy, which limits access to key resources (CloudTrail, IAM, security, etc.).  As we were creating these principles, it became clear that the IAM team would not have the bandwidth to be the gatekeepers of all roles and policies (how access is granted in AWS). Developers would need to be empowered to create their own service roles, while we maintained control over the user access roles via Okta. Letting go of this oversight was very difficult. If not properly managed, it could have opened us up to the risk of malicious, over-permissioned or accidental modification of key security services.

Our initial solution to this problem was to create a “deny” policy that would prevent services and users from interacting with some key security services. For example, there should never be a need within an application or microservice to create a new IAM user or a new SAML provider. We notified all users that this deny policy must be attached to all roles created and we used an external system to report any roles that didn’t have this deny policy attached.

Recently, AWS released a new IAM feature called permission boundaries. The intent of permission boundaries is similar to that of our deny policy.  By using permission boundaries we can control the maximum permissions users can grant to the IAM roles that they create.  We are planning to roll this out in lieu of the deny policy in the very near future.

Example of a role found without the deny policy attached

Implementing Okta and AWS

When thinking through connecting Okta and AWS, we were presented with two very different architectural designs: hub and spoke and direct connect. The hub and spokedesign leverages an AWS landing account that is connected to Okta. Once logged in to this account, users can switch roles into other AWS accounts that are authorized. The direct connect design we implemented creates a new Okta application icon for each AWS account. Users access their accounts by visiting their Okta homepage and selecting the account they want to use.

Power users tend to prefer the hub and spoke model, as this allows them to quickly jump from account to account without re-logging in or grabbing a new API token. The more casual users prefer to have all accounts presented on one page. They aren’t swapping among accounts, and it isn’t fair to ask them to memorize account numbers (or even exact short names) so they can execute an assume role command. In addition to user experience, we considered how easy it was to automate management once a new account has been created. The two approaches each have merit, so we decided to implement both.

When a new account is created, it is bootstrapped to leverage the hub and spoke landing account. Automation can immediately start working with the account, and certain power users get the access they need without any IAM intervention. The IAM team can revisit the account when convenient and stand up the direct connection to Okta. New Okta features, currently in beta, will improve this direct connect process.

One final thing I would like to touch on is how we leverage the Okta launcher to get API tokens in AWS. One of the benefits of having local users in AWS is that each user is given their own API key. While this is a benefit to end users, these keys are very rarely rotated and could present a significant security risk (such as an accidental public GitHub upload). To address this, Okta has created a java applet that generates a temporary AWS API key. The repo can be found here. Like many other companies, we have created wrappers for this script to make things as easy as possible for our end users. After cloning the repo, a user can type the command “okta -e $ENV_NAME” and the script will reach out to Okta and generate an API key for that specific AWS account.  The users do need to know the exact environment name for this script to work, but most power users that need API access will have this information.

No matter where your company is on the path to leveraging a cloud service provider, IAM is a foundational component that needs to be in place for a successful and secure journey. If possible, try to leverage your existing technologies to help improve user experience and adoption. I hope the principles we shared here help you think through your own requirements.

Code42-Tips-from-the-Trenches-Searching-Files-in-the-Cloud

Tips From the Trenches: Searching Files in the Cloud

In a few of my previous blogs, I shared some examples of ways the Code42 security team uses Code42 Forensic File Search to find interesting files — macro-enabled Microsoft Office files, known malicious MD5 hashes and so on. Now that the search capabilities of our newest product have been extended beyond endpoints to include cloud services, such as Google Drive and Microsoft OneDrive, I’d like to look at how we’re using this broadened visibility in our investigations.

“ Because we can now use Code42 Forensic File Search to search for files and file activity across both endpoints and Google Drive, we can be more certain of the locations of sensitive files when we are doing file movement investigations. ”

Finding files – and tracking file movement – in the cloud

Code42 uses Google Drive as a cloud collaboration platform. Because we can now use Code42 Forensic File Search to search for files and file activity across both endpoints and Google Drive, we can be more certain of the locations of sensitive files when we are doing file movement investigations. We combine Code42 Forensic File Search with the Code42 File Exfiltration Detection solution to execute an advanced search — using a given MD5 hash — to find files that have been moved to a USB drive. This allows us to quickly build a complete picture of where a file exists in our environment — and how it may have moved from someone’s laptop to the cloud and back.

What files are shared externally?

Using the latest version of Code42 Forensic File Search, we can also search files based on their sharing status. For example, in a matter of a few seconds, we can search for all Google Drive documents that are shared with non-Code42 users. This shows us all documents that have been intentionally or inadvertently shared outside of the company. A deeper look at this list helps us identify any information that has been shared inappropriately. As with all searches within Code42 Forensic File Search, these investigations take only a few seconds to complete.

Here’s a hypothetical example: Let’s say the organization was pursuing an M&A opportunity and we wanted to make sure that confidential evaluation documents weren’t being shared improperly. We could use Code42 Forensic File Search to pull up a list of all documents shared externally. Should that list contain one of the confidential M&A evaluation documents, we could look more closely to determine if any inappropriate sharing occurred.

Continually finding new use cases

Code42’s ffs-tools repository on GitHub now includes several new searches that take advantage of our new cloud capabilities. You can find them all here.

Like most organizations, we use many cloud services to perform our day-to-day work. That’s why in the near future, we plan to expand the search capabilities of Code42 Forensic File Search across even more cloud services — giving you even greater visibility into the ideas and data your organization creates, no matter where they live and move.

Happy threat hunting!

Tips from the Trenches: Multi-Tier Logging

Tips From the Trenches: Multi-Tier Logging

Here’s a stat to make your head spin: Gartner says that a medium-sized enterprise creates 20,000 messages of operational data in activity logs every second. That adds up to 500 million messages — more than 150 GB of data — every day. In other words, as security professionals, we all have logs. A lot of logs. So, how do we know if our log collection strategy is effectively meeting our logging requirements? Unfortunately, a one-size-fits-all logging solution doesn’t exist, so many leading security teams have adopted a multi-tier logging approach. There are three steps to implementing a multi-tier logging strategy:

“ A one-size-fits-all logging solution doesn’t exist, so many leading security teams have adopted a multi-tier logging approach. ”

1. Analyze your logging requirements

A multi-tier logging strategy starts with analyzing your logging requirements. Here’s a simple checklist that I’ve used for this:

Who requires access to the organization’s logs?

  • Which teams require access?
  • Is there unnecessary duplication of logs?
  • Can we consolidate logs and logging budgets across departments?

What logging solutions do we currently have in place?

  • What is the current health of our logging systems?
  • Are we receiving all required logs?
  • Have we included all required log source types?
    • Do we need public cloud, private cloud, hybrid cloud and/or SaaS logs?
  • How many events per second (EPS) are we receiving?
  • How much log storage (in gigabytes) are we using now?
  • What are our logs of interest?
    • Create alerts and/or reports to monitor for each.

What time zone strategy will you use for logging? 

  • How many locations are in different time zones across the organization?
  • Will you use a single time zone or multiple time zone logging strategy?

How much storage capacity will be needed for logging for the next 3-5 years?

Do we have a log baseline in place?

  • Where are our logs stored now?
  • Where should they be stored in the future?

Are we collecting logs for troubleshooting, security analysis and/or compliance?

  • What are our compliance requirements?
    • Do we have log storage redundancy requirements?
    • What are our log retention requirements?
    • Do we have log retention requirements defined in official policy?
  • What logs do we really need to keep?
    • Identify those that are useful.
    • Drop those that are not.

2. Digest log information

After all of this information is gathered, it’s time to digest it. It’s important to align your logging infrastructure to log type and retention needs — so you don’t end up inserting a large amount of unstructured data that you will need to be able to quickly search in an SQL database, for example. Most organizations have multiple clouds, many different devices that generate different log types and separate required analysis methods. In other words, one solution usually does not meet all logging needs.

3. Implement multi-tier logging

If, after analyzing your logging requirements, you find that one logging strategy does not meet all of your requirements, consider this tiered logging flow:

Code42 Tiered Logging Flow Example

In this example logging flow, there are three different logging flow types and five different log repositories. There are SIEM logs, application logs and system log flow types. The repositories are the SIEM database, ELK (elasticsearch, logstash and kibana) stack, two long-term syslog archival servers and cloud storage. The repositories each have a unique role:

  • The SIEM correlates logs with known threats.
  • The ELK stack retains approximately 30-60 days of logs for very fast searching capabilities.
  • The two syslog archival servers store the last three to seven years of syslog and application logs for historical and regulatory purposes. One syslog archival server is used for processing logs, the other is a limited-touch, master log repository.
  • Cloud storage also stores the last three to seven years of logs for historical and regulatory purposes.

Simplify your log activity

This is just one quick example of an innovative solution to simplifying log activity. Regardless of whether multi-tier logging is the right solution for your organization, the most critical step is making sure you have a clearly defined logging strategy and an accurate baseline of your current logging state. This basic analysis gives you the understanding and insights you need to simplify log activity — making it easier to accomplish the complex logging goals of your organization.