Reading time ~10 minutes
So I Heard You Want to Learn Kubernetes
- Why What You Think You Know is Probably Wrong
- Start From Here
- If You Want to be Production Ready
- What About Security?
Kubernetes is getting popular by the day, and is probably one of the hottest buzzwords of 2018.
With names like eBay, Goldman Sachs, Huawei, ING, SAP, and many others listed as corporate users, it is surely a technology which has got a consolidated place in our industry.
At the same time, Kubernetes got the infamous nomea of being hard to understand. Mostly due to rumors, but some other times it has been proven to be easy to get wrong, like experienced by the Monzo team a few months back:
I still remember the sense of confusion when I decided I wanted to get a better understanding of Kubernetes, as I felt like I didn’t know where to start, or what to tackle first.
In this post I will try to demystify the perception by which Kubernetes is believed to be too hard to even get started, by walking through the journey I undertook to get the basics first, and later to focus on the security aspects.
Hopefully this will help a little in your own journey to understand Kubernetes.
Why What You Think You Know is Probably Wrong
I would like to start with an article (from which I also borrowed the title for this section) which introduces the three main players in this space: “Docker vs. Kubernetes vs. Apache Mesos: Why What You Think You Know is Probably Wrong”.
There are countless articles, discussions, and lots of social chatter comparing Docker, Kubernetes, and Mesos. If you listen to the partially-informed, you’d think that the three open source projects are in a fight-to-the death for container supremacy. You’d also believe that picking one over the other is almost a religious choice; with true believers espousing their faith and burning heretics who would dare to consider an alternative.
That’s all bunk.
While all three technologies make it possible to use containers to deploy, manage, and scale applications, in reality they each solve for different things and are rooted in very different contexts. In fact, none of these three widely adopted toolchains is completely like the others.
The article continues by reviewing each project’s goal, architecture, and how they can complement and interact with each other. Definitely a lightweight introduction to the topic.
Next, I want to tackle the elephant in the room, or how Kubernetes is perceived to be too hard to even getting started:
Sunday read: “Cloud Native Comes With New Challenges”. Nicely summarised by this line: “It’s damn near impossible for someone new to Kubernetes to figure out how to get started […] and it shouldn’t be that way.” https://t.co/tooP6lPNXC— Marco Lancini (@lancinimarco) August 26, 2018
What I would like to stress here is that this is just a misconception, as nicely highlighted in “Kubernetes is NOT Scary, Complex or Even Confusing”:
Unlike the platform itself, the routine pre-requisite tasks needed to build a Kubernetes cluster are complex and hard. […]
It’s developers lack of exposure and patience in common cluster operations tasks like creating a secure communication (TLS), configuring load balancers, running daemon services and other environment prep tasks that lead to this perception. […]
There are few short-cuts for multi-node operational tasks like building a public key infrastructure (PKI), load balancer configuration, installing Docker correctly, configuring services in systemd or upstart, and creating a functional software defined network. And that does not even consider sequence sensitive tasks like expanding or upgrading a cluster. Since Kubernetes will not work without all this heavy lifting, it’s no wonder that a simple three tier platform gets a reputation as complex.
Building on this, you can see how many components needs to be setup properly before being “production ready”. Julia Evans made a very nice drawing to illustrate them:
kubernetes components pic.twitter.com/ACsUmZjscs— 🔎Julia Evans🔍 (@b0rk) June 8, 2017
Start From Here
If you decided to stick around till here, you are probably committed to learn Kubernetes. Here is where you can start from.
First of all, you will need a mean to run Kubernetes (after all, how can you learn to master something if not by doing it?). Especially at the beginning, a local installation is going to be enough: if you run MacOS you are lucky because Kubernetes is now bundled in Docker for Mac, otherwise you can rely on Minikube.
Then, you will have to familiarize yourself with the different moving parts in a Kubernetes installation, like pods, deployments, services, ingress, and so on. The diagrams in “Kubernetes & Traefik 101— When Simplicity Matters” can really save you hours of headaches trying to fit everything together (you can stop reading at “Let’s Start Putting Everything Together!” If you are not interested into Traefik).
After this, there are other additional components that are worth getting accustomed with, as they are at the core of Kubernetes inner workings:
- kubelet: the primary “node agent” that runs on each node. It takes a set of PodSpecs (YAML or JSON object that describes a pod) provided and ensures that the containers described in those PodSpecs are running and healthy.
- etcd: a consistent and highly-available key value store used as Kubernetes’ backing store for all cluster data, like service tokens, secrets and service configurations.
Once you get a grasp of the different resource types, it’s going to be the time to learn how to interact with them:
“Deploying and Scaling Microservices with Docker and Kubernetes” from Jérôme Petazzoni is probably the best entry-level tutorial freely available and perfect for self-study (it also has a complementary GitHub repository).
The training starts by introducing main concepts around Kubernetes and its network model,
then it discusses
kubectl before delving into setting up a cluster and gradually rolling out some docker containers.
I personally found the first 2 chapters really helpful; you can then skip chapter 3 (from slide
201) and go back to chapter 4 (from slide
Building on this knowledge, you would probably want to go a bit deeper on Kubernetes networking with a series of 3 blog posts on the argument: “Understanding kubernetes networking: pods”, “Understanding kubernetes networking: services”, and “Understanding kubernetes networking: ingress”.
All these resources will give you everything you need to get started with Kubernetes. I personally recommend Katacoda to integrate this knowledge with practical exercises. In addition “Write a Kubernetes-ready service from zero step-by-step” is an end-to-end guide on how to deploy a Go service from scratch.
If You Want to be Production Ready
The section above will hopefully give you all you need to get a decent understanding of Kubernetes, but, as I mentioned at the beginning of this article, this is not nearly enough to be considered as “production ready”.
Here is a list of more advanced resources that can offer a better insight on how to run Kubernetes in production:
- Cloud Native DevOps with Kubernetes: although not fully released yet, I had the privilege to be a technical reviewer for this book from John Arundel and Justin Domingus. It shows how to build and develop an example cloud native application with Kubernetes hands-on, explaining how to apply each of the concepts—such as authentication or reliability—one at a time to develop a non-trivial, production-ready cloud native application, complete with a development environment and deployment pipeline that can be used for real workloads. Definitely a must read.
- Kubernetes The Hard Way: a tutorial optimized for learning, which means taking the long route to ensure you understand each task required to bootstrap a Kubernetes cluster. The target audience for this tutorial is someone planning to support a production Kubernetes cluster and wants to understand how everything fits together.
- Kubernetes: Up and Running: explains how Kubernetes fits into the life cycle of a distributed application.
- Kubernetes Best Practices: which lists best practices for building an app to run on Kubernetes, using cloud-native technologies.
- Operating a Kubernetes network: while there’s a reasonable amount written about how to set up your Kubernetes network, there isn’t much about how to operate your network and be confident that it won’t create a lot of production incidents for you down the line. This blog from Julia Evans addresses exactly this.
What About Security?
We can’t talk about security of a Kubernetes cluster without talking about security of Docker containers, so ensure you have a clear grasp of the threats facing a containerized systems.
The Security Section of the official documentation of Docker, and “5 security concerns when using Docker” are a start for understanding container threats. Next, NCC Group’s whitepaper on “Understanding and Hardening Linux Containers” examines attack surfaces, threats, and related hardening features in order to properly evaluate container security. Note that this paper examines not only about Docker, but Linux Containers as a whole as implemented by LXC, Docker, and CoreOS Rkt among others.
Then, to conclude with the prerequisite of Docker security, I would also recommend some additional resources:
- Docker Security CheatSheet: a nice infographic showing different types of security threats and how to avoid them.
- Docker Engine Security Cheat Sheet: another cheatsheet, this time containing links to additional resources and tools.
- Docker Security Best-Practices: an article that aims to provide a list of common security mistakes and security best-practices/recommendations.
- Docker Secure Deployment Guidelines: suggests hardening actions that can be undertaken to improve the security posture of the containers within their respective environment.
- CIS Docker Benchmark: as for any other CIS Benchmark, this document provides prescriptive guidance for establishing a secure configuration posture for Docker.
Threat Modelling Orchestrator Systems
Moving on, the next step is to understand the threats faced by an orchestrator (like Swarm or Kubernetes). The “Least Privilege Container Orchestration” post on the Docker Blog offers a threat model for Docker Swarm, focusing on key areas of orchestration. Although the topic of discussion in this post is Swarm, the same concepts can be applied to Kubernetes as well:
- Joining the cluster: Preventing malicious nodes from joining the cluster.
- Organizing hosts into security zones: Preventing lateral movement by attackers.
- Scheduling tasks: Issuing tasks only to designated and allowed nodes.
- Allocating resources: Preventing a malicious node to “steal” workload or resources belonging to another node.
- Storing secrets: Preventing storing in plaintext and writing to disk on worker nodes.
- Communicating with the workers: Using mutually authenticated TLS.
Another very useful high level threat model for an orchestrator system can be found in
“Datacenter Orchestration Security and Insecurity” by Dino Dai Zovi et al.
The presentation starts with Docker, but from slide
22 it shifts on Kubernetes and discusses its most important security mechanisms:
- Role-Based Access Control (RBAC).
- PodSecurityPolicy Admission Controller.
- NodeRestriction Admission Controller.
- NetworkPolicy resources.
“Kubernetes security: Consider your threat model” from NCC Group is a high-level introduction to the main groups of threats specific to Kubernetes:
- External attackers: People who have no access to a cluster apart from being able to reach the applications running on it and/or the management port(s) over a network.
- Malicious containers: Where an attacker has access to a single container (likely through some application vulnerability) and would like to expand their access to take over the whole cluster.
- Malicious user/stolen credentials: Where an attacker has valid credentials to execute commands against the Kubernetes API, as well as network access to the port.
After having understood the main threats, the “Securing a Cluster” section of the Kubernetes documentation offers some insight on topics related to protecting a cluster from accidental or malicious access, and provides recommendations on overall security. In particular, this section describes how to:
- Control access to the Kubernetes API (TLS, API Authentication and Authorization).
- Control access to the Kubelet.
- Control the capabilities of a workload or user at runtime (limiting resource usage, controlling privileges, restricting network access).
- Protect cluster components from compromise (restricting access to etcd, enabling audit logging, encrypting secrets at rest).
Still from the Kubernetes documentation, “Security Best Practices for Kubernetes Deployment” goes into more details with some practical examples.
I recommend also to dig deeper on a few other topics by reading the following:
- Securing Kubernetes Cluster Networking.
- Using RBAC Authorization, from the Kubernetes documentation.
- Authenticating, from the Kubernetes documentation.
- Using Admission Controllers, from the Kubernetes documentation.
Finally, the last two recommended readings are both titled “Kubernetes Security Best-Practices” (1, 2), which cover some common security mistakes and offer some general best-practices around securing Kubernetes clusters and workloads.