Reading time ~5 minutes
Semgrep for Cloud Security
- What is Semgrep?
- Semgrep for Infrastructure as Code
Semgrep is an emerging static analysis tool which is getting traction within the AppSec community. Its broad support to multiple programming languages, together with the easiness with which is possible to create rules, makes it a powerful tool that can help AppSec teams scaling their efforts into preventing complete classes of vulnerabilities from their codebases.
But what about cloud security? In the era of Infrastructure as Code, where tools like Terraform, CloudFormation, Pulumi (and many others) are used to provision infrastructure from (de-facto) source code, can we apply the same approach to eradicate classes of cloud-related vulnerabilities from a codebase?
I decided to spend part of my weekend experimenting with this, and to get an idea of what Semgrep can provide to cloud/platform security teams.
What is Semgrep?
Before jumping into the details, it is worth explaining what Semgrep actually is. As per their website, Semgrep is:
A fast, open-source, static analysis tool that excels at expressing code standards — without complicated queries — and surfacing bugs early at editor, commit, and CI time.
Precise rules look like the code you’re searching; no more traversing abstract syntax trees or wrestling with regexes.
The Semgrep Registry has 1,000+ rules written by the Semgrep community covering security, correctness, and performance bugs. No need to DIY unless you want to.
At a high level, Semgrep leverages Abstract Syntax Trees (ASTs) to build a model of the code you are analyzing. Unlike other tools based on ASTs, though, Semgrep lowers the entry bar by abstracting away the AST syntax itself.
Explaining how to use Semgrep is out of scope for this blog post, but the official documentation is really well made, and the online playground is an excellent space where to start playing with it (without having to spend time installing anything).
Semgrep for Infrastructure as Code
What I was curious to try was how well the same approach could fit a codebase
made of Terraform (HCL) and YAML files, as those languages are not currently
supported by Semgrep. Hence, I relied on its
Generic Pattern Matching engine.
The official semgrep-rules repository already contains a folder dedicated to Terraform.
Within this folder, we can see 7 rules already made open source, mainly focusing on Terragoat scenarios and S3 buckets.
Unencrypted EBS Volumes
Let’s start wrapping our head around it by picking the
In the repo we can see a sample Terraform file (shown here below):
Quite straightforward, with an
aws_ebs_volume resource declaring an EBS volume
with encryption disabled (as it can bee seen from
encrypted = false).
So what we want to
grep here is for an occurrence of
encrypted = false
(or the lack of
encrypted = true), as shown in the
You can try this rule in the Semgrep playground: https://semgrep.dev/s/ZWrA/.
Open Security Groups
As a second test, I wanted to create my first Semgrep rule to detect
a Security Group open to the world (
0.0.0.0/0), like the one below:
What we want to
grep here is any occurrence of
0.0.0.0/0 within an
You can try this rule in the Semgrep playground: https://semgrep.dev/s/ne51/.
Of course this is a very basic case, where the offending string (
is directly hardcoded within the security group definition. The rule
will have to be extended if we want to take into account cases where
the CIDR can be specified, for example, via variables.
Next, I wanted to create a rule more focused on Kubernetes (or, more precisely, YAML files).
Let’s take as a sample the case where you might want to enforce all your
Kubernetes Ingresses to be private, removing all the
In this example, we want to
grep for the
annotation, and ensure it has the approved value of
You can try this rule in the Semgrep playground: https://semgrep.dev/s/ErGE/.
I have to say the extensibility, and simple syntax, of Semgrep are making it very promising for cloud security teams as well. In a few hours, thanks to the official documentation and Playground, I was able to go from absolute 0 to writing my first rules.
The main challenge I can think of at the moment is: how much does Semgrep overlap with OPA Conftest? Although Conftest has been created with cloud resources in mind, and benefits from the sinergies with the rest of the OPA offering (like Gatekeeper), basically everyone in the industry at some point complained about how cumbersome the Rego language is. In my opinion, this could be a defining factor that might help expand the adotpion of Semgrep from platform teams.
I’m quite curious to hear other people’s opinions on this, so please feel free to reach out to me on Twitter.