Reading time ~12 minutes
On Establishing a Cloud Security Program
- The Goal: a Roadmap for Cloud Security Teams
- The North Star
- Building the Roadmap
Congratulations! You have been tasked with establishing a cloud security strategy. Now what?
In this post, I’m going to walk through actionable advice that can be undertaken to establish a cloud security program aimed at protecting a cloud native, service provider agnostic, container-based, offering.
The Goal: a Roadmap for Cloud Security Teams
Security strategies focusing on cloud native solutions are becoming prominent within the industry, but it feels like everyone is trying to - due to a lack of shared knowledge - reinvent the wheel every time.
Infact, there are not many public resources describing how to approach this topic: although different resources cover specific aspects of specific use cases (e.g., how to do container scanning, or how to deploy Open Policy Agent), there is a lack of a single holistic view on how to integrate everything together.
In this post, I will start with the foundations, and go through the different milestones (or maturity levels) required to reach a “best in class” solution to support and secure a product that span across multiple service providers (hence the requirement of not being tied to platform-specific solutions), runs on Kubernetes, and must comply with strict regulations (like the ones that apply to fintech companies).
The North Star
Before jumping into the details, I think it is important to define a “North Star” that can be used as a reference point (and driver) for the definition of your strategy.
These are the high-level goals that will then be reflected within the roadmap and mapped to actual controls that can be implemented. For cloud native solutions, I grouped these main pillars by the five functions of the NIST Cybersecurity Framework: Identify, Protect, Detect, Respond, and Recover.
|Known good state||
|Zero Trust model||
|Micro blast radius||
|Continuous secure baseline validation||
|Strong auditability and accountability||
Subscribe to CloudSecList
Building the Roadmap
As said, these high-level goals provide macro-areas that can be worked against, but they are very general (and open to interpretations). Taking a step further, how can they be applied to a cloud native platform, where multiple cloud service providers and Kubernetes clusters are involved?
Ideally, we would like to use a framework which:
- Allows to embrace an agile approach (with multiple iterations, which enable continuous improvement).
- Is transparent to other engineering teams (i.e., security teams should be low friction and not be blockers).
- Will ultimately lead to a solution that is compliant with industry regulations (e.g., ISO27001, PCI DSS, etc.) by “default”.
Hence, I took the Cloud Security Alliance (CSA) Cloud Controls Matrix (CCM) and started performing a gap analysis and RACI matrix to map controls to Security teams, and selecting areas directly applicable to a cloud security team (i.e., excluding controls like physical security of a data center, usually not directly applicable to such teams). Then, I enhanced this list by adding cloud-specific controls I thought are essential for a comprehensive program (usually also backed by CNCF) and re-organized them in areas of interest.
In the sections below I will explain in detail these main areas (
Controls), and actionable
Tasks which compose the Roadmap:
from the definition of high-level security policies, network architecture, IAM,
and assets inventory; to monitoring, code provenance, policy as code;
and up to automatic enforcement of security policies, runtime anomaly detection,
and business continuity.
Domains can be considered as “macro-areas” which can be used to group set of
| Policies & Standards||Definition of Security Policies and Standards which provide reference documentation on best practices for cloud security, with a particular focus on cloud providers and containerization solutions.|
| Architecture||Definition and review of architectural decisions, with particular focus on network architecture, identity and access management, secrets management, and data classification.|
| Verification||Continuously verify and enforce all cloud resources are abiding by the policies and expected baseline configuration.|
| Supply Chain Security||Enforce security controls throughout the pipeline:
| Monitoring and Alerting||Implement logging, monitoring, and alerting systems so to have visibility around activities and/or changes affecting the environments.|
| Incidents and Remediation||Implement processes for containment, forensics, and automatic remediation of security violations.|
| Business Continuity||Prepare countermeasures for unexpected incidents or disasters.|
These domains can then be fleshed out into a variety of workstreams (or
Before exploring them in detail,
it is worth noting that, generally speaking,
a cloud security program can be implemented throughout a series of maturity levels.
The sub-sections below will provide an overview of the main initiatives that,
Domain, could be undertaken at each level of maturity.
Maturity Level 1 - The foundations
- Definition of Security Policies: start by defining some overarching policies that will define your overall approach and that the business will have to abide by (e.g., Cloud Security Policy, Vulnerability/Patch Management Standard).
- Architecture: review the network architecture and ensure proper segregation of environments (especially production), review the Identity and Access Management Framework, as well as how secrets management is performed.
- Verification: start by getting the so-called “low hanging fruits” by validating no obvious misconfigurations (both at the CSP and K8s level) are present, as well as by starting obtaining a list of public endpoints.
- Supply Chain: deploy container image scanning, and start restricting access to privileged AWS/GCP users.
- Monitoring: start defining a security logging strategy (I provided examples for both AWS and GCP).
Maturity Level 2
- Definition of Security Standards: continue developing standards covering more “advanced” topics like Key Management/Generation and Data Handling/Labeling.
- Architecture: depending on the current state of IAM and Secrets management (found in Level 1), you might want to tackle processes like credentials management and user access provisioning.
- Verification: start deploying a solution that can continuously provide an up-to-date asset inventory (for example, see “Mapping Moving Clouds: How to stay on top of your ephemeral environments with Cartography”). Improve the validation of the environments by deploying automation that can continuously report misconfigurations and drift.
- Supply Chain: start working on securing the images used (define a list of base images and harden them). Enforce the use of these secure images in the CI/CD pipeline, and add automation able to scan Infrastructure as Code for security issues. Work with your Application Security team to ensure a system to prevent the leaking of secrets through the codebase is integrated into the pipeline.
- Monitoring: deploy the security logging solution designed at Level 1, and ensure logs are collected from all environments. Start defining monitoring and alerting rules to act on indicators of compromise and/or known classes of issues.
Maturity Level 3
- Definition of Security Standards: keep extending standards to cover Identity and Access Management, Encryption, Key Management/Generation, Data Handling/Labeling, Change Management.
- Verification: provide continuous identification of deviations from defined Security Policies and compliance frameworks (e.g., via AWS Security Hub and GCP Security Command Center), with a process integrated within the security pipeline (i.e., your SIEM). Start deploying guardrails (e.g., SCPs and Org Policies) to prevent entire classes of misconfigurations.
- Supply Chain: ensure automatic validation of the configuration of the Kubernetes clusters and running containers is performed so to detect any misconfiguration. Address hardening of the AWS/GCP organizations.
- Monitoring: start aggregate and report on both logged data and anomalies, and create visualizations/dashboards to facilitate their consumption. Deploy processes and tools to detect cases of credential compromise.
- Remediation: Employ processes to automate the remediation of (at least) the most common types of misconfigurations.
Maturity Level 4
- Business Continuity: start tackling Business Continuity issues (Audit Planning, Business Continuity Planning, Incident Management).
- Monitoring: any changes made to production should be logged and eventually alerted upon. In addition, file integrity (host) and network intrusion detection (IDS) tools should be deployed to help facilitate timely detection, investigation by root cause analysis, and response to incidents. In particular, processes and tools shall be put in place to implement a runtime anomaly detection solution, aligned with MITRE ATT&CK for Cloud.
- Remediation: start creating playbooks to define detailed processes to follow in case of an incident. Timely de-provisioning of user access to data and systems should be implemented.
- Business Continuity: a Disaster Recovery Plan should be outlined, in the eventuality of the outage/failure of one or more core components of the infrastructure (e.g., failure of an AZ or Region).
Maturity Level 5
- Supply Chain: utilize a framework (like TUF, in-toto, providence) to protect the integrity of the Supply Chain.
- Monitoring: a solution should be put in place to detect exfiltration of data, by monitoring egress traffic.
- Remediation: automated processes should be put in place to automate the containment of (at least) the most common compromise types, and to automate the forensic collection of evidence after the declaration of a security incident.
- Business Continuity: tabletop exercises and live tests should be conducted to test the effectiveness of controls put in place to mitigate an eventual failure of one or more core components of the infrastructure.
At a first glance,
the list of initiatives outlined above might seem quite dense (and not super-actionable).
That’s why I expanded them into a set of
94 at the time of writing),
which can be individually worked upon.
Having almost a hundred controls in a blog post wouldn’t be practical, though, so I created a micro-website to host them in a spreadsheet-style format.
Each row represents a
Task, and has the following attributes:
|Task||The Task name|
|Description||A description of what the Task involves|
|Status||To keep track of progress (
|Maturity||How mature is the deployment/rollout of the Task, once you started working on it|
|Layer||Whether it affects a Cloud Provider, Kubernetes cluster, or both|
|Epic||Link to Jira/Issue Tracker, to keep track of progress|
|Deliverable||Type of deliverable for the Task (
|Artifact||Link to the final deliverable for the Task|
|Useful Resources||Some useful resources that can help during the implementation phase|
|Metrics||Metrics that can be used to track the success of the Task|
|CSA CCM||Reference to the related entry in the CSA CCM, if any|
From there, you’ll have the ability to export it as CSV and tailor it to your needs.
I’d like to stress that you don’t have to follow the tasks in order,
but you should use the
Priority column to define your own priorities,
which can change based on your business priorities and industry.
Putting all Together: The Roadmap
In this post I outlined some actionable advice that can be undertaken to establish a cloud security program aimed at protecting a cloud native, service provider agnostic, container-based, offering.
It does represent my perspective and reflects my experiences, so it definitely won’t be a “one size fits all”, but I hope it could be a useful baseline.
I hope you found this post useful and interesting, and I’m keen to get feedback on it! If you find the information shared was useful, if something is missing, or if you have ideas on how to improve it, please let me know on Twitter.