| Follow @lancinimarco | Subscribe to CloudSecList

Reading time ~27 minutes

What to look for when reviewing a company's infrastructure

Early last year, I wrote “On Establishing a Cloud Security Program”, outlining some advice that can be undertaken to establish a cloud security program aimed at protecting a cloud native, service provider agnostic, container-based offering. The result can be found in a micro-website which contains the list of controls that can be rolled out to establish such cloud security program: A Cloud Security Roadmap Template.

Following that post, one question I got asked was: “That’s great, but how do you even know what to prioritize?


The challenge of prioritization

And that’s true for many teams, from the security team of a small but growing start-up to a CTO or senior engineer at a company with no cloud security team trying to either get things started or improve what they have already.

Here I want to tackle precisely the challenge many (new) teams face: getting up to speed in a new environment (or company altogether) and finding its most critical components.

This post, part of the “Cloud Security Strategies” series, aims to provide a structured approach to review the security architecture of a multi-cloud SaaS company, having a mix of workloads (from container-based, to serverless, to legacy VMs). The outcome of this investigation can then be used to inform both subsequent security reviews and prioritization of the Cloud Security Roadmap.


The review process

There are multiple situations in which you might face a (somewhat completely) new environment:

  1. You are starting a new job/team: Congrats! You are the first engineer in a newly spun-up team.
  2. You are going through a merger or acquisition: Congrats (I think?)! You now have a completely new setup to review before integrating it with your company’s.
  3. You are delivering a consulting engagement: This is different from the previous 2, as this usually means you are an external consultant. Although the rest of the post is more tailored towards internal teams, the same concepts can also be used by consultancies.

So, where do you start? What kind of questions should you ask yourself? What information is crucial to obtain?

Luckily, abstraction works in our favour.

Abstraction Levels
Abstraction Levels

We can split the process into different abstraction levels (or “Phases”), from cloud, to workloads, to code:

  1. Start from the Cloud Providers.
  2. Continue by understanding the technology powering the company’s Workloads.
  3. Complete by mapping the environments and workloads to their originating source Code (a.k.a. code provenance).
The 3 Phases of the Review
The 3 Phases of the Review

Going through this exercise allows you to familiarise yourself with a new environment and, as a good side-effect, organically uncover its security risks. These risks will then be essential to put together a well-thought roadmap that addresses both short (a.k.a. extinguishing the fires) and long term goals (a.k.a. improve the overall security and maturity of the organization).

Once you have a clear picture of what you need to secure, you can then prioritize accordingly.

It is important to stress that you need to understand how something works before securing (or attempting to secure) it. Let me repeat it; you can’t secure what you don’t understand.

A note on breadth vs depth

A caution of warning: avoid rabbit holes; not only they can be a huge time sink, but they are also an inefficient use of limited time.

You should put your initial focus on getting a broad coverage of the different abstraction levels (Cloud Providers, Workloads, Code). Only after you complete the review process you’ll have enough context to determine where additional investigation is required.

Phase 1: Cloud Providers

Cloud Service Providers (or “CSPs” in short) provide the first layer of abstraction, as they are the primary collection of environments and resources.

Hence, the main goal for this phase is to understand the overall organizational structure of the infrastructure.

Stage 1: Identify the primary CSP

In this post, we target a multi-cloud company with resources scattered across multiple CSPs.

Since it wouldn’t be technically feasible to tackle everything simultaneously, start by identifying the primary cloud provider, where Production is, and start from it. In fact, although it is common to have “some” Production services in another provider (maybe a remnant of a past, and incomplete, migration), it is rare for a company to have an equal split of Production services among multiple CSPs (and if someone claims so, verify it is the case). Core services are often hosted in one provider, while other CSPs might host just some ancillary services.

The 3 main Cloud Providers

Once identified, consider this provider as your primary target. Go through the steps listed below, and only repeat them for the other CSPs at the end.

A note on naming conventions

For the sake of simplicity, for the rest of this post I will use the term “Account” to refer both to AWS Accounts and GCP Projects.

Stage 2: Understand the high-level hierarchy

Organization Layout - AWS Organization Layout - GCP
General Organizational Layouts for AWS and GCP
  • How many Organizations does the company have?
  • How is each Organization designed? If AWS, what do the Organizational Units (OUs) look like? If GCP, what about the Folder hierarchy?
  • Is there a clear split between environment types? (i.e., Production, Staging, Testing, etc.)
  • Which Accounts are critical? (i.e., which ones contain critical data or workloads?) If you are lucky, has someone already compiled a risk rating of the Accounts?
  • How are new Accounts created? Are they created manually or automatically? Are they automatically onboarded onto security tools available to the company?

The quickest way to start getting a one-off list of Accounts (as well as getting answers for the questions above) is through the Organizations page of your Management AWS Account, or the Cloud Resource Manager of your GCP Organization.

You can see mine below (yes, my AWS Org is way more organized than my 2 GCP projects 😅):

Organization - AWS Organization - GCP
Sample Organizations for AWS and GCP

Otherwise, if you need some inspiration for more “creative” ways to get a list of all your Accounts, you can refer to the “How to inventory AWS accounts” blog post from Scott Piper.

Improving from the one-off list, you could later consider adding some automation to discover and document Accounts automatically. One example of such a tool is cartography. In my Continuous Visibility into Ephemeral Cloud Environments series, I’ve personally blogged about this:

Stage 3: Understand what is running in the Accounts

Here the goal is to get a rough idea of what kind of technologies are involved:

  • Is the company container-heavy? (e.g., Kubernetes, ECS, etc.)
  • Is it predominantly serverless? (e.g., Lambda, Cloud Function, etc.)
  • Is it relying on “legacy” VM-based workloads? (e.g., vanilla EC2)

If you have access to Billing, this should be enough to understand which services are the biggest spenders. The “Monthly spend by service” view of AWS Cost Explorer, or equivalently “Cost Breakdown” of GCP Billing, can provide an excellent snapshot of the most used services in the Organization. Otherwise, if you don’t (and can’t) have access to Billing, ask your counterpart teams (usually platform/SRE teams).

Monthly spend by service view of AWS Cost Explorer
Monthly spend by service view of AWS Cost Explorer

(If you are interested in cost-related strategies, CloudSecDocs has a specific section on references to best practices around Cost Optimization.)

In addition, both AWS Config (if enabled) and GCP Cloud Asset Inventory (by default) allows having an aggregated view of all assets (and their metadata) within an Account or Organization. This view can be beneficial to understand, at a glance, the primary technologies being used.

AWS Config GCP Cloud Asset Inventory
AWS Config and GCP Cloud Asset Inventory

At the same time, it is crucial to start getting an idea around data stores:

  • What kind of data is identified by the business as the most sensitive and critical to secure?
  • What type of data (e.g., secrets, customer data, audit logs, etc.) is stored in which Account?
  • How is data rated according to the company’s data classification standard (if one exists)?

Stage 4: Understand the network architecture

In this stage, the goal is to understand what the network architecture looks like:

  • What are the main entry points into the infrastructure? What services and components are Internet-facing and can receive unsolicited (a.k.a. untrusted and potentially malicious) traffic?
  • How do customers get access to the system? Do they have any network access? If so, do they have direct network access, maybe via VPC peering?
  • How do engineers get access to the system? Do they have direct network access? Identify how engineering teams can access the Cloud Providers’ console and how they can programmatically interact with their APIs (e.g., via command-line utilities like AWS CLI or gcloud CLI).
  • How are Accounts connected to each other? Is there any Account separation in place?
  • Is there any VPC peering or shared VPCs between different Accounts?
  • How is firewalling implemented? How are Security Groups and Firewall Rules defined?
  • How is the edge protected? Is anything like Cloudflare used to protect against DDoS and common attacks (via a WAF)?
  • How is DNS managed? Is it centralized? What are the principal domains associated with the company?
  • Is there any hybrid connectivity with any on-prem data centres? If so, how is it set up and secured? (For those interested, CloudSecDocs has a specific section on references around Hybrid Connectivity.)
Multi-account, multi-VPC architecture
Multi-account, multi-VPC architecture - Courtesy of Amazon

There is no easy way to obtain this information. If you are lucky, your company might have up to date network diagrams that you can consult. Otherwise, you’ll have to make an effort to interview different stakeholders/teams to try to extract the tacit knowledge they might have.

For this specific reason, this stage might become one of the most time-consuming parts of the entire review process (and often the one skipped for this exact reason), especially if no up to date documentation is available. Nonetheless, this stage is the one that might bring the most significant rewards in the long run: as a (Cloud) Security team, you should be confident around your knowledge of how data flows within your systems. This knowledge will then be critical for providing accurate and valuable recommendations and risk ratings going forward.

Stage 5: Understand the current IAM setup

Identity and Access Management, or IAM in short, is the apex of the headaches for security teams.

In this stage, the goal is to understand how authentication and authorization to cloud providers are currently set up. The problem, though, is twofold, as you’ll have to tackle this not only for human (i.e., engineers) but also for automated (i.e., workloads, CI jobs, etc.) access.

Cross Account Auditing in AWS
Cross Account Auditing in AWS

The previous stage started looking at how engineering teams connect to Cloud Providers concerning human access. Now is the time to expand on this:

  • Where are identities defined? Is an Identity Provider (like G Suite, Okta, or AD) being used?
  • Are the identities being federated in the Cloud Provider from the Identity Provider? Are the identities being synced automatically from the Identity Provider?
  • Is SSO being used?
  • Are named users being used as a common practice, or are roles with short-lived tokens preferred? Here the point is not to do a full IAM audit but to understand the standard practice.
  • For high-privileged accounts, are good standards enforced? (e.g., password policy, MFA - preferably hardware)
  • How is authorization enforced? Is the principle of least privilege generally followed, or are overly permissive (non fine-tuned) policies usually used? For example, CloudSecDocs has summaries of best practices for IAM in both AWS and GCP.
  • How is Role-Based Access Control (RBAC) used? How is it set up, enforced, and audited?
  • Is there a documented process describing how access requests are managed and granted?
  • Is there a documented process describing how access deprovisioning is performed during offboarding?

For automated access:

  • How do Accounts interact with each other? Are there any cross-Account permissions?
  • Are long-running (static) keys and service accounts generally used, or are short-lived tokens (i.e., STS) usually preferred?
  • How is authorization enforced? Is the principle of least privilege generally followed, or are overly permissive (non fine-tuned) policies usually used?
Visualization of Cross-Account Role Access.
Visualization of Cross-Account Role Access, obtained via Cartography

It is important to stress that you won’t have time to go into reviewing every single policy at this stage, at least for any decently-sized environment. What you should be able to do, though, is get a grasp of the company’s general trends and overall maturity level.

We will tackle a proper security review later.

Stage 6: Understand the current monitoring setup

Next, special attention should be put on the current setup for collecting, aggregating, and analyzing security logs across the entire estate:

  • Are security-related logs collected at all?
  • If so, which services offered by the Cloud Providers are already being leveraged? For AWS, are at least CloudTrail, CloudWatch, and GuardDuty enabled? For GCP, what about Cloud Monitoring and Cloud Logging?
  • What kind of logs are being already ingested?
  • Where are the logs collected? Are logs from different Accounts all ingested in the same place?
  • What’s the retention policy for security-related logs?
  • How are logs analyzed? Is a SIEM being used?
  • Who has access to the SIEM and the raw storage of logs?

For some additional pointers, in my Continuous Visibility into Ephemeral Cloud Environments series, I’ve blogged about what could (and should) be logged:

Architecture Diagram - Security Logging Platform in AWS Architecture Diagram - Security Logging Platform in GCP
Architecture Diagram - Security Logging in AWS and GCP

Then, focus on the response processes:

  • Are any Intrusion Detection systems deployed?
  • Are any Data Loss Prevention systems deployed?
  • Are there any processes and playbooks to follow in case of an incident?
  • Are there any processes to detect credential compromise situations?
  • Are there any playbooks or automated tooling to contain tainted resources?
  • Are there any playbooks or automated tooling to aid in forensic evidence collection for suspected breaches?

Stage 7: Understand the current secrets management setup

In this stage, the goal is to understand how secrets are generated and managed within the infrastructure:

  • How are new secrets generated when needed? Manually or automatically?
  • Where are secrets stored?
  • Is a secrets management solution (like HashiCorp Vault, AWS Secrets Manager, or GCP Secret Manager) currently used?
  • Are processes around secrets management defined? What about rotation and revocation of secrets?
High-Level Overview of Vault Integrations
High-Level Overview of Vault Integrations

For additional pointers, the Secrets Management section of CloudSecDocs has references to some other solutions in this space.

Once again, resist the urge to start “fixing” things for now and keep looking and documenting.

Stage 8: Identify existing security controls

In this stage, the goal is to understand which controls have already been implemented, maybe by a previous iteration of the Security team, and can already be leveraged:

For additional pointers, CloudSecDocs has sections describing off-the-shelf services for both AWS and GCP.

Stage 9: Get the low-hanging fruits

Finally, if you managed to reach this stage, you should have a clearer picture of the situation in the primary Cloud Provider. As a final stage for the Cloud-related phase of this review, you could then run a tactical scan (or benchmark suite) against your Organization.

The quickest way to get a quick snapshot of vulnerabilities and misconfigurations in your estate is to reference Security Hub for AWS and Security Command Center for GCP. If they are already enabled, of course.

Sample Dashboard for AWS Security Hub Sample Dashboards for GCP Security Command Center
Sample Dashboards for AWS Security Hub and GCP Security Command Center

Otherwise, on CloudSecDocs you can find some additional tools for both testing (AWS, GCP, Azure) and auditing (AWS, GCP, Azure) environments.

Note that this is not meant to be an exhaustive penetration test but a lightweight scan to identify high exposure and high impact misconfigurations or vulnerabilities already in Production.


Subscribe to CloudSecList

If you found this article interesting, you can join thousands of security professionals getting curated security-related news focused on the cloud native landscape by subscribing to CloudSecList.com.


Phase 2: Workloads

Next, let’s focus on Workloads: the services powering the company’s business offerings.

The main goal for this phase is to understand the current security maturity level of the different tech stacks involved.

Stage 1: Understand the high-level business offerings

As I briefly mentioned in the first part of this post, you can’t secure what you don’t understand. This concept becomes predominantly true when you are tasked with securing business workloads.

Hence, here the goal is to understand what are the key functionalities your company offers to their customers:

  • How many key functionalities does the company have? For example, if you are a banking company, these functionalities could be payments, transactions, etc.
  • How are the main functionalities designed? Are they made by micro-services or a monolith?
  • Is there a clear split between environment types? (i.e., Production, Staging, Testing, etc.)
  • Which functionalities are critical? (i.e., both in terms of data and customer needs)
Monolithic vs Microservice Architectures
Monolithic vs Microservice Architectures

If you are fortunate, your company might have already undergone the process (and pain) to define a way to keep an inventory of business offerings and respective workloads and, most importantly, keep it up to date. Otherwise, like in most cases, you’ll have to partner with product teams to understand this.

Then, try to map business functionalities to technical workloads, understanding their purpose for the business:

  • Which ones are Internet-facing?
  • Which ones are customer-facing?
  • Which ones are time-critical?
  • Which ones are stateful? Which ones are stateless?
  • Which ones are batch processing?
  • Which ones are back-office support?

Stage 2: Identify the primary tech stack

Once the key functionalities have been identified, you’ll want to split them by technologies.

If you recall, the goal of Stage 3 (“Understand what is running in the Accounts”) of the Cloud-related phase of this review was to get a rough idea of what kind of technologies the company relies upon. Here is the time to go deeper by identifying what is the main stack.

For the rest of the post, I’ll assume this can be one of the three following macro-categories (each can then have multiple declinations):

  • Container-based (e.g., Kubernetes, ECS, etc.)
  • Serverless (e.g., Lambda, Cloud Function, etc.)
  • VM-based (e.g., vanilla EC2)

Similar to the challenge we faced for Cloud Providers, it won’t be technically feasible to tackle everything simultaneously. Hence, start by identifying the primary tech stack, the one Production mainly relies upon, and start from it. Your company might probably rely on a mix (if not all) of those macro-areas: nonetheless, divide and conquer.

Kubernetes
Kubernetes vs Serverless vs VMs

Once identified, consider this stack as your primary target. Go through the steps listed below, and only repeat them for the other technologies at the end.

Stage 3: Understand the network architecture

While in Stage 4 of the Cloud-related phase of the review you started making global considerations like what are the main entry points into the infrastructure or how do customers and engineers get access to the systems, in this stage the goal is to understand what the network architecture of the workloads looks like.

Depending on the macro-category of technologies involved, you can ask different questions.

Kubernetes:

  • Which (and how many) clusters do we have? Are they regional or zonal?
  • Are they managed (EKS, GKE, AKS) or self-hosted?
  • How do the clusters communicate with each other? What are the network boundaries?
  • Are clusters single or multi-tenant?
  • Are either the control plane or nodes exposed over the Internet?
  • How do engineers connect? How can they run kubectl? Do they use a bastion or something like Teleport?
  • What are the Ingresses?
  • Are there any Stateful workloads running in these clusters?
Kubernetes Trust Boundaries
Kubernetes Trust Boundaries - Courtesy of CNCF

Serverless:

  • Which type of data stores are being used? For example, SQL-based (e.g., RDS), NoSQL (e.g., DynamoDB), or Document-based (e.g., DocumentDB)?
  • Which type of application workers are being used? For example, Lambda or Cloud Functions?
  • Is an API Gateway (incredibly named in the same way by both AWS and GCP! 🤯) being used?
  • What is used to de-couple the components? For example, SQS or Pub/Sub?
A high-level architecture diagram of an early version of CloudSecList.com
A high-level architecture diagram of an early version of CloudSecList.com

VMs:

  • What Virtual Machines are directly exposed to the Internet?
  • Which Operating Systems (and versions) are being used?
  • How are hosts hardened?
  • How do engineers connect? Do they SSH directly into the hosts, or is a remote session manager (e.g., SSM or OS Login) used?
  • What’s a pet, and what’s cattle?

Stage 4: Understand the current IAM setup

In this stage, the goal is to understand how authentication and authorization to workloads are currently set up.

These questions are general, regardless of the technology type:

  • How are engineers interacting with workloads? How do they troubleshoot them?
  • How is authorization enforced? Is the principle of least privilege generally followed, or are overly permissive (non fine-tuned) policies usually used?
  • How is Role-Based Access Control (RBAC) used? How is it set up, enforced, and audited?
  • Are workloads accessing any other cloud-native services (e.g., buckets, queues, databases)? If yes, how are authentication and authorization to Cloud Providers set up and enforced? Are they federated, maybe via OpenID Connect (OIDC)?
  • Are workloads accessing any third party services? If yes, how are authentication and authorization set up and enforced?
OIDC Federation of Kubernetes in AWS
OIDC Federation of Kubernetes in AWS - Courtesy of @mjarosie

For additional pointers, CloudSecDocs has sections describing how authentication and authorization in Kubernetes work.

Stage 5: Understand the current monitoring setup

The goal of this stage is to understand which logs are collected (and how) from workloads.

Some of the questions to ask are general, regardless of the technology type:

  • Are security-related logs collected at all?
  • What kind of logs are being already ingested?
  • How are logs collected?
  • Where are the logs forwarded?

These can then can be declined and tailored to the relevant tech type.

Kubernetes:

  • Are audit logs collected?
  • Are System Calls and Kubernetes Audit Events collected via Falco?
  • Is a data collector like fluentd used to collect logs?
  • Is the data collector deployed as a Sidecar or Daemonset?
High-Level Architecture of Falco
High-Level Architecture of Falco

For additional pointers, CloudSecDocs has a section focused on Falco and its usage.

Serverless:

  • How are applications instrumented?
  • Are metrics and logs collected via a data collector like X-Ray or Datadog?
Sample X-Ray Dashboard
Sample X-Ray Dashboard

VMs:

  • For AWS, is the CloudWatch Logs agent used to send log data to CloudWatch Logs from EC2 instances automatically?
  • For GCP, is the Ops Agent used to collect telemetry from Compute Engine instances?
  • Is an agent like OSQuery used to provide endpoint visibility?

Stage 6: Understand the current secrets management setup

In this stage, the goal is to understand how secrets are made available to workloads to consume.

  • Where are workloads fetching secrets from?
  • How are secrets made available to workloads? Via environment variables, filesystem, etc.
  • Do workloads also generate secrets, or are they limited to consuming them?
  • Is there a practice of hardcoding secrets?
  • Assuming a secrets management solution (like HashiCorp Vault, AWS Secrets Manager, or GCP Secret Manager) is being used, how are workloads authenticating? What about authorization (RBAC)?
  • Are secrets bound to a specific workload, or is any workload able to potentially fetch any other secret? Is there any separation or boundary?
  • Are processes around secret management defined? What about rotation and revocation of secrets?
Vault's Sidecar for Kubernetes
Vault's Sidecar for Kubernetes - Courtesy of HashiCorp

Stage 7: Identify existing security controls

In this stage, the goal is to understand which controls have already been implemented.

It is unfeasible to list them all here, as they can depend highly on your actual workloads.

Nonetheless, try to gather information about what has already been deployed. For example:

  • Any admission controllers (e.g., OPA Gatekeeper) or network policies in Kubernetes?
  • Any third party agent for VMs?
  • Any custom or third party solution?

Specifically for Kubernetes, CloudSecDocs has a few sections describing focus areas and security checklists.

Stage 8: Get the low-hanging fruits

As a final stage for the Workloads-related phase of this review, you could then run a tactical scan (or benchmark suite) against your key functionalities.

As for the previous stage, this is highly dependent on the actual functionalities and workloads developed within your company.

Nonetheless, on CloudSecDocs you can find additional tools for testing and auditing Kubernetes clusters.

Note that this is not meant to be an exhaustive penetration test but a lightweight scan to identify high exposure and high impact misconfigurations or vulnerabilities already in Production.


Phase 3: Code

Those paying particular attention might have noticed that I haven’t mentioned the words source code a single time so far. That’s not because I think code is not essential. Quite the opposite, as I reckon it deserves a phase of its own.

Thus, the main goal for this phase is to complete the review by mapping the environments and workloads to their originating source code and understanding how code reaches Production.

Stage 1: Understand the code’s structure

The goal of this stage is to understand how code is structured, as it will significantly affect the strategies you will have to use to secure it.

Conway’s Law does a great job in capturing this concept:

Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure.

Meaning that software usually ends up “shaped like” the organizational structure they are designed in or designed for.

Visualization of Conway's Law
Visualization of Conway's Law - Courtesy of Manu Cornet

In technical terms, the primary distinction to make is about monorepo versus multi-repos philosophy for code organization:

  • The monorepo approach uses a single repository to host all the code for the multiple libraries or services composing a company’s projects.
  • The multi-repo approach uses several repositories to host the multiple libraries or services of a project developed by a company.

I will probably blog more in the future about how each of these philosophies affects security, but for now, try to understand how is the company designed and where the code is.

Then, look for which security controls are already added to the repositories:

  • Are CODEOWNERS being utilized?
  • Are there any protected branches?
  • Are code reviews via Pull Requests being enforced?
  • Are linters automatically run on the developers’ machines before raising a Pull Request?
  • Are static analysis tools (for the relevant technologies used) automatically run on the developers’ machines before raising a Pull Request?
  • Are secrets detection tools (e.g., git-secrets) automatically run on the developers’ machines before raising a Pull Request?

Stage 2: Understand the adoption of Infrastructure as Code

The goal of this stage is to understand which resources are defined as code, and which are created manually.

  • Which Infrastructure as Code (IaC) frameworks are being used?
  • Are Cloud environments managed via IaC? If not, what is being excluded?
  • Are Workloads managed via IaC? If not, what is being excluded?
  • How are third party modules sourced and vetted?
Adoption of different IaC frameworks
Adoption of different IaC frameworks - Courtesy of The New Stack

Stage 3: Understand how CI/CD is setup

In this stage, the goal is to understand how code is built and deployed to Production.

  • What CI/CD platform (e.g., Github, GitLab, Jenkins, etc.) is being used?
  • Is IaC, for both Cloud environments and Workloads, automatically deployed via CI/CD?
    • How is Terraform applied?
    • How are container images built?
  • Is IaC automatically tested and validated in the pipeline?
Pipeline for container images
Pipeline for container images - Courtesy of Sysdig
  • How is code provenance guaranteed?
  • Have other security controls been embedded in the pipeline?
  • Are there any documented processes for infrastructure and configuration changes?
  • Is there a documented Secure Software Development Life Cycle (SSDLC) process?

Stage 4: Understand how the CI/CD platform is secured

Finally, the goal of this stage is to understand how the chosen CI/CD platform itself is secured. In fact, as time passes, CI/CD platforms are becoming focal points for security teams, as a compromise of such systems might mean a total compromise of the overall organization.

A few questions to ask:

  • How is access control defined?
    • Who has access to the CI/CD platform?
    • How does the CI/CD platform authenticate to code repositories?
    • How does the CI/CD platform authenticate to Cloud Providers and their environments?
    • How are credentials to those systems managed? Is the blast radius of a potential compromise limited?
    • Is the principle of least privilege followed?
    • Are long-running (static) keys generally used, or are short-lived tokens (e.g., via OIDC) usually preferred?
Credentials management in a pipeline
Credentials management in a pipeline - Courtesy of WeaveWorks
  • What do the security monitoring and auditing of the CI/CD platform look like?
  • Are CI runners hardened?
  • Is there any isolation between CI and CD?
  • How are 3rd party workflows sourced and vetted?
Security mitigations for CI/CD pipelines
Security mitigations for CI/CD pipelines - Courtesy of Mercari

For additional pointers, CloudSecDocs has a section focused on CI/CD Providers and their security.


Let’s put it all together

Thanks for making it this far! 🎉

Hopefully, going through this exercise will allow you to have a clearer picture of your company’s current security maturity level. Plus, as a side-effect, you should now have a rough idea of the most critical areas which deserve more scrutiny.

The goal from here is to use the knowledge you uncovered to put together a well-thought roadmap that addresses both short (a.k.a. extinguishing the fires) and long term goals (a.k.a. improve the overall security and maturity of the organization).

Useful summaries

For convenience, here are all the stages of the review, grouped by phase, gathered in one handy chart:

The Stages of the Review, by Phase
The Stages of the Review, by Phase

In addition, I’ve also created a micro-website to host all the questions you should ask in a spreadsheet-style format.

Micro-website hosting the questions in a spreadsheet-style format
Micro-website hosting the questions in a spreadsheet-style format
The list of questions can be found at: roadmap.cloudsecdocs.com/infrastructure-review

Document as you go

As the last recommendation, it is crucial not to lose the invaluable knowledge you gathered throughout this review. The suggestion is: document as you go:

  1. Keep a journal (or wiki) of what you discover. Bonus points if it’s structured and searchable, as it will simplify keeping it up to date.
  2. Start creating a risk registry of all the risks you discover as you go. You might not be able to tackle them all now, but it will be invaluable to drive the long term roadmap and share areas of concern with upper management.

Conclusions

In this post, part of the “Cloud Security Strategies” series, I provided a comprehensive guide that provides a structured approach to reviewing the security architecture of a multi-cloud SaaS company and finding its most critical components.

It does represent my perspective and reflects my experiences, so it definitely won’t be a “one size fits all”, but I hope it could be a helpful baseline.

I hope you found this post valuable and interesting, and I’m keen to get feedback on it! If you find the information shared was helpful, or if something is missing, or if you have ideas on how to improve it, please let me know on 🐣 Twitter or at 📢 feedback.marcolancini.it.

Thank you! 🙇‍♂️

Subscribe to CloudSecList

If you found this article interesting, you can join thousands of security professionals getting curated security-related news focused on the cloud native landscape by subscribing to CloudSecList.com.

Marco Lancini

Marco Lancini
Hi, I'm Marco Lancini. I'm a Security Engineer, mainly interested in cloud native technologies and security...  [read more]