This blog series is based on a talk I gave at the 2019 DevOpsUnicorns conference in Riga, Latvia. It was heavily influenced by the work of Timothy Appnel; you should follow him on Twitter here. There are three parts:

  • Part 1: What are Kubernetes operators and why should we build them with Ansible?
  • Part 2: Developing your first operator with Ansible.
  • Part 3: What's next for the community and for you?

Part 1: What are Kubernetes operators and why should we build them with Ansible?

Two of the most exciting things happening in tech right now are containers and automation. Two open source projects at the epicenter of that are Kubernetes and Ansible. This blog series discusses an intersection of those two projects, the Ansible Operator. There's a lot of background information surrounding something this specific, and it would be impossible to cover it all in a few blog posts. Hopefully, this will provide enough information to decide whether or not it is something you would like to learn more about, and the third post in the series will talk about how to do that. But first let's look at a little background info on the major concepts that we're going to be dealing with.

kubernetes

According to the documentation, Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. Kubernetes comes from a Greek word, κυβερνήτης, meaning helmsman, commander, or pilot, which might help to explain the logo. kubernetes

It was originally developed by Google, and was open sourced in 2014. Since then Red Hat and VMware have consistently been the number two and three contributers respectively after Google. The project was later donated to the Cloud Native Computing Foundation. At this point it's fair to say that it is the default container management platform in the industry.

ansible

ansible
Ansible, according to its documentation is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates. Along with Kubernetes, Ansible is one of the most popular and high velocity open source projects of all time. If you're interested in how something like that is calculated, there is an excellent blog post here, and an interesting GitHub project here. Ansible was acquired by Red Hat in 2015, and is commonly referred to as the language of automation.

a natural fit

There are a lot of similarities between Ansible and Kubernetes that make this combination make sense. Both projects, as noted above, have vibrant and active communities. We'll talk more about that later. Both projects have an architectural commitment to "desired state" and are heavily declarative rather than imperative. A good example of the difference can be found in Kubernetes: Up and Running:

"...declarative configuration is an alternative to imperative configuration, where the state of the world is defined by the execution of a series of instructions rather than a declaration of the desired state of the world. While imperative commands define actions, declarative configurations define state."[1]

So instead of saying something like, "add the first replica, then the second, then the third", we would say "make sure there are three replicas".

In an effort to make things as declarative as possible, both technologies are heavy users of YAML. Recently there was an exchange on twitter where a user asked: "Can you imagine being at a whole conference about yaml?" and Kelsey Hightower responded "I've been. It's called KubeCon." He went on to make the claim that everything in the world of Kubernetes ends up in a YAML file. Let's take a look at two different YAML files. First we have a ConfigMap object definition for Kubernetes. We would use something like this with kubctl (or the oc command for OpenShift).

apiversion: v1
kind: ConfigMap
metadata:
  name: yourMap
  namespace: default
data:
  color: red

Next, we have an Ansible task that does the exact same thing:

-name: create yourMap ConfigMap
  k8s:
    definition:
      apiverion: v1
      kind: ConfigMap
      metadata
        name: yourMap
        namespace: default
      data:
        color: "{{ color }}"

The most important difference here is that"{{ color }}"at the bottom there. This is a templating parameter, and it's using Jinja2 templating functionality. Jinja is a "modern and designer-friendly templating language for Python, modelled after Django’s templates. It is fast, widely used and secure." Jinja is heavily used in Ansible, and instead of templating this one parameter, Jinja gives us the power to create an entire parameterized template, and pull that template into our Ansible task with something like this:

definition: "{{ lookup('template', '/yourTemplate.yml') | from_yaml }}"

If the potential power of this feature isn't clear already, it will become more evident as we go on.

operators

operator
The concept of operators stems from a simple truth: Kubernetes is really great at dealing with stateless apps, but not very good with stateful ones. This is, in many ways, due to the very thing that has made Kubernetes so successful: its simplicity. In order for Kubernetes, or any platform for that matter, to handle stateful applications well, it would need to be aware of domain specific knowledge. This is the information that is specific to an individual application, and it's often what differentiates that application. In other words, domain specific knowledge is often closely tied to an application's business value.

The conversation around operators started in November 2016, with a blog post from CoreOS. The post begins by discussing site reliability engineers. These engineers were people who got tired of doing the same thing over and over again, and started to automate things with code. CoreOS proposed that we recreate this pattern in the Kubernetes world, and called this concept "operators". Here is how they defined it in that post:

“An Operator is an application-specific controller that extends the Kubernetes API to create, configure, and manage instances of complex stateful applications on behalf of a Kubernetes user. It builds upon the basic Kubernetes resource and controller concepts but includes domain or application-specific knowledge to automate common tasks.”

The post goes on to highlight that operators are built on two Kubernetes concepts: Resources and Controllers. Other Kubernetes objects already make use of these, such as ReplicSets and Deployments. One could even argue that these objects are the original operators.
OGsmall

The entire blog post is absolutely worth a read.

Operators work using a pattern that is very common in Kubernetes, something called a reconciliation loop. This is how it is defined in Kubernetes: Up and Running:

"The central concept behind a reconciliation loop is the notion of desired state and observed or current state. Desired state is the state you want...In contrast, current state is the currently observed state of the system...The reconciliation loop is constantly running, observing the current state of the world and taking action to try to make the observed state match the desired state."[2]

A nice graphical representation of this pattern is provided in Managing Kubernetes:[3]
reconciliation-loop-400

Different operators will utilize this pattern in radically different ways, but it is fundamental to every operator architecture.

Often, when people first start discussing operators, an initial criticism is raised; it goes something like this:

"We already have technical debt trying to modernize our existing applications, and now you're telling us that we need to write an additional application for every application?"

Yes. That's exactly what we're saying. This is the direction we believe the Kubernetes community is moving. But there's a flaw in the way this question is posed. It makes it sound like there is a tradeoff between existing applications and operators. That's not the case. Operators are deploying the application and responding to events. They check for anomalies and mitigate them using reconciliation loops. This is work you're already doing, you're just doing it manually. The move toward operators isn't about taking resources away from your applications, it's about doing work you're already doing in a more efficient and Kubernetes native manner. The point of operators is to reduce work, specifically the repetitive, mundane work that we might call "toil". Once this becomes clear, the next problem is simple: not every knows Go or has time to learn it, and operators are written in Go, right?

the operator framework

framework

Nearly two years after that first blog post discussing the concept of operators, and three months after being acquired by Red Hat, CoreOS announced the Operator Framework. The original announcement can be read here. The whole point of the framework is to reduce the barrier to entry to anyone who wants to develop and use a Kubernetes operator. The framework has three parts:

  • Operator SDK: Enables developers to build Operators based on their expertise without requiring knowledge of Kubernetes API complexities.
  • Operator Lifecycle Management (OLM): OLM extends Kubernetes to provide a declarative way to install, manage, and upgrade operators and their dependencies in a cluster.
  • Operator Metering: Operator metering is responsible for collecting metrics and other information about what's happening in a Kubernetes cluster, and providing a way to create reports on the collected data.

OLM and metering are both exciting elements of the framework that deserve their own attention, but for the remainder of this blog series we will be focusing on on the Operator SDK.

The Operator SDK has three types, all with different capabilities. The three types are Helm, Ansible, and Go. The following graphic demonstrates the relative capability profile of each.

capability

This graphic was originally called the operator maturity model, but it was later changed to the capability model. This is because the word maturity carries the connotation that the operators are expected to evolve. They may not necessarily be the case. The whole point is that the functionality the operator contains is specific to your application, and if your application only requires Phase II capability, that's something only you would know. That being said, this graphic is a great overview of the capability profiles that are available for each operator type in the SDK, and it is updated regularly as they evolve. You'll see that the Helm type is limited to Phase II. This is quite simply because the Helm type is based on Helm, which is a deployment tool. We'll be looking into this type more in depth in future posts. It's no surprise that the Go type is full featured. The Operator SDK, and Kubernetes itself, is written in Go. The Ansible type is also full featured. This wasn't the case even as recently as Spring of 2019. The change came with the development of the k8s module in Ansible. (k8s stands for Kubernetes for the same reason i18n stands for internationalization. You take the first letter, the number of letters in the middle, and the last letter.) There have been a few attempts to create modules to interact with Kubernetes, but the success of the k8s module allows us to access the full range of Kubernetes APIs, which in turn renders the Ansible operator type in the SDK a first class citizen.

why ansible?

The reasons many people will choose to use the Ansible operator are the same reasons so many people choose to use Ansible: reduced barriers to entry and an established community.

The barrier to entry for Ansible is unbelievably low. It's the only project I've ever played with that I was able to start using the same day I started to learn it. It strikes a perfect balance between power and simplicity and I believe that it is truly the language of automation for that reason. In part three of this blog series we'll discuss some of the many resources available that can help you get started with Ansible if you haven't already had the pleasure. From the perspective of operators, this is important because the learning curve for Ansible is much lower than the learning curve for Go, and since the SDK gives you so much out of the box, you can concentrate your focus on your playbooks or roles, and the watch file that ties them to Kubernetes events. We will discuss the watch file and the rest of the architecture in the next post. For those of you who are already using Ansible to manage your Kubernetes infrastructure, this transition is going to be extraordinarily simple.

The second reason people will choose Ansible for their operators is because of the community. A thriving community and an established ecosystem mean a lot of things, including more eyes to look for bugs as well as more informal support in forums and more contributions to the project. It also means more smart people working on the same problems, allowing you access to an existing body of work and examples to build on, improve, and cater to your specific use cases. It also provides an opportunity to contribute back to a thriving community, improving life for others as well as the quality of your CV.

the whole point

In the next post, we'll drill deeper into the architecture of the Ansible operator, take a look at what the SDK provides, and actually build an operator together. As we move forward it will become more and more clear that the Ansible type in the Operator SDK achieves exactly what it set out to do: to make operators more operable. Or...Opera Bull, if you will...

operabull-official400

...and you will.


  1. Kubernetes: Up and Running: Dive into the Future of Infrastructure 1st Edition, 22. ↩︎

  2. ibid., 155. ↩︎

  3. Manageing Kubernetes: Operating Kubernetes Clusters in the Real World, 23. ↩︎