Reading time ~15 minutes
Security Logging in Cloud Environments - GCP
- Problem Statement
- Which Services Can We Leverage?
- State of the Art Security Logging Platform in GCP
If you had to architect a multi-account security logging strategy, where should you start?
This blog, part of the “Continuous Visibility into Ephemeral Cloud Environments” series, will describe a design for a state of the art multi-account security-related logging platform in GCP.
A previous post covered a similar setup for AWS, hence I tried to follow the same structure here. A later post will cover a setup for Kubernetes instead.
One of the usual requirements for Security teams is to improve the visibility over (production) environments. In this regard, it is often necessary to design and rollout a strategy around security-related logging. This entails defining the scope for logging (resources, frequency, etc.), as well as providing an integration with existing monitoring and alerting systems.
The end goal is to deploy a security logging and monitoring solution with well established metrics and integrations with a SIEM of choice (Elasticsearch in this case). In particular, the solution should be able to:
- Collect security-related logs from all environments.
- Ingest those logs into a SIEM (e.g., Elasticsearch).
- Parse those logs and use them to generate dashboards in Kibana.
- Create alerts on anomalies.
In this regard, this post is composed of two main parts. The first introduces the logging-related services made available by GCP to their customers, alongside with their main features. The second describes a state of the art the design for a security-related logging platform, and provides the high-level architecture and best practices to follow during the implementation phase.
Which Services Can We Leverage?
Similar to AWS, GCP offers multiple services around logging and monitoring. Cloud Operations (formerly known as StackDriver) has been defined as a suite of products to monitor, troubleshoot, and operate services at scale. It now includes Cloud Logging, Cloud Monitoring, Cloud Trace, Cloud Debugger, and Cloud Profiler.
In the remainder of this section I’ll provide a summary of the main services we will need to design our security logging platform.
Cloud Logging receives, indexes, and stores log entries
from many sources, including GCP, AWS,
VM instances running the
fluentd agent, and user applications:
|Cloud Audit Logs||
|Access Transparency Logs||
As briefly mentioned above, Google Cloud Audit Logs record the who, where, and when for activity within your environment, and ultimately help security teams maintain audit trails in GCP.
With them, it is possible to attain the same level of transparency over administrative activities and accesses to data in GCP as in on-premises environments. Every administrative activity is recorded on a hardened, always-on audit trail, which cannot be disabled by any rogue actor.
Cloud Audit Logs provides the following audit logs for each Project, Folder, and Organization within a resource hierarchy:
|Admin Activity Audit Logs||
|System Event Audit Logs||
|Data Access Audit Logs||
|Policy Denied Audit Logs||
For more information, see Best practices for Cloud Audit Logs.
Cloud Monitoring collects metrics, events, and metadata from GCP, AWS, hosted uptime probes, and application instrumentation. It also provides dashboards, alerts, and uptime checks that can be used to ensure systems are running reliably.
In addition, Cloud Monitoring allows to create custom alerting policies: whenever events trigger conditions in one of the alerting policies defined, Cloud Monitoring creates and displays an incident in the console. If you set up notifications, Cloud Monitoring can also send notifications to people or third-party notification services.
Cloud Identity is Google’s Identity as a Service (IDaaS) product, which can be used to provision, manage, and authenticate users across GCP environments. Cloud Identity is how people in an organization gain a Google identity, and it’s these identities that are granted access to Google Cloud resources.
In this regard, Cloud Identity logs track events that may have a direct impact on a GCP environment. Relevant logs include:
|Admin Audit Logs||
|Login Audit Logs||
|Groups Audit Logs||
|OAuth Token Audit Logs||
|SAML Audit Logs||
Security Command Center
Security Command Center is defined by Google as a risk dashboard and analytics system for surfacing, understanding, and remediating Google Cloud security and data risks across an organization.
Security Command Center enables the generation of insights that provide a unique view of incoming threats and attacks to Google Cloud resources (called “assets”), by displaying possible security risks (called “findings”) that are associated with each asset. Findings can come from security sources that include Security Command Center’s built-in services, third-party partners (like Cloudflare, CrowdStrike, Prisma Cloud, and Qualys), or even custom sources.
Security Command Center currently focuses on asset inventory, discovery, search, and management:
|Asset discovery and inventory||
Alerts triggered by Security Command Center can be turned into real-time notifications via integrations with Pub/Sub.
Particular mention has to be made for Access Logs, which are generated by a variety of services:
|VPC FLow Logs||
|Cloud Load Balancing||
Subscribe to CloudSecList
State of the Art Security Logging Platform in GCP
So how could we design a multi-account security-related logging platform in GCP?
Let’s start with a high-level architecture diagram of a solution with multiple “projects” (or customers), each with production and non-production environments (note how every project/customer will have the same setup). Here I will assume the workloads run predominantly in a Kubernetes cluster (managed GKE), but with some stateful services involved as well (i.e., CloudSQL).
Starting from collection, Cloud Logging should be enabled in every GCP project, so to collect logs from every environment (whether it is production or not).
In particular, the following information should be collected:
|Application Event Logs||
|Audit and Access Transparency Logs||
|DNS Query Logs||
In conjunction, Cloud Monitoring is going to be enabled in order to ingest events, metrics, and metadata and generate insights (through dashboards, charts, and alerts). In addition, Cloud Monitoring should also be used to create and manage custom alerting policies (more on this later).
On top of this, it could be useful to also collect findings coming from Security Command Center. Security Command Center, enabled at the GCP Organization level, ingests findings from Security Health Analytics, as well as Event Threat Detection and Container Threat Detection. Once ingested, Notification Configs can be used to dispatch each finding to a Pub/Sub topic hosted in the relevant GCP project (the one the finding is associated with).
Finally, Cloud Identity Logs (at least the
Groups Audit Logs)
should be collected, as described in the Cloud Identity section.
Since the integrity, completeness and availability of the collected logs is crucial for forensic and auditing purposes, a queueing system like Pub/Sub should be used to receive and buffer all the logs collected.
Since Cloud Logging retains app and audit logs for a limited period of time, export sinks are going to be configured in order to store logs for extended periods, both to meet compliance obligations and for historical analysis: Pub/Sub is going to get configured to receive and buffer all the logs forwarded by Cloud Logging, so that they can be exported to any external monitoring service. In this regard, the “Design patterns for exporting from Logging” guide, together with the “Aggregated Exports” feature (which allows to set up a sink at the Cloud IAM organization level, and export logs from all the projects inside the organization), can be used as a reference for the export strategy.
This not only will improve the resiliency of the platform by queueing (without discarding) messages in the event of the failure of a downstream component which is meant to consume logs, but it also allows to decouple log ingestion from log consumption.
Long-Term Storage and Audit Trail
A dedicated and highly restricted Project (here named
should also be created for each project/customer for long term (immutable) storage of the logs.
In that Project, a Logstash Agent can be used to pull logs directly from Pub/Sub topics and to store them into a bucket where they will be treated as immutable files. This can be achieved via Bucket Retention Policies and Retention Policy Locks (see “Retention policies and retention policy locks”), to ensure that nobody would be able to delete the objects during a pre-defined retention period.
In addition, a Data Loss Prevention (DLP) solution could be employed to prevent and detect cases of attempted data exfiltration. It should be noted that, to ensure the integrity of the logs stored in such projects, IAM controls should be put in place to limit access to these buckets (see “Access control guide for Cloud Logging”).
Monitoring and Alerting
Finally, a centralized Account/Project (here called
Centralized Monitoring Account, and hosted in another cloud provider)
can then be used to aggregate logs collected from the different Projects.
In this account, another Logstash Agent will have dedicated subscriptions to pull logs from each Pub/Sub topic defined in every Project and forward them to an ElasticSearch instance used by a Security Operations (i.e., SOC) team to monitor and respond to threats in (near) real time.
As mentioned previously, Cloud Monitoring could also be used to create and manage alerting policies. This way, whenever events trigger conditions in one of the alerting policies, Cloud Monitoring creates and displays an incident in the Monitoring console. Notifications can be setup, so that Cloud Monitoring can send notifications to relevant staff members.
In this blog post, part of the “Continuous Visibility into Ephemeral Cloud Environments” series, I described a possible approach for designing a multi-account security-related logging platform in GCP.
A previous post covered a similar setup for AWS, while a later post will cover Kubernetes instead.
I hope you found this post useful and interesting, and I’m keen to get feedback on it! If you find the information shared was useful, if something is missing, or if you have ideas on how to improve it, please let me know on Twitter.