Puppet Without Masters

2011-01-08 10-minute read

Puppet labs’s use of the term puppetmaster is rather clever (in contrast to other un-necessarily offensive uses of “master” in the software world).

While I appreciate the clever name, I’m less impressed with the concept.

At May First/People Link we’ve spent the last several years (including the last couple months in earnest) working to transition management of our 90-some servers from a collection of hand-written bash scripts to puppet.

Over the years, we’ve worked hard to keep our servers as secure as possible. We have a team of about a half dozen people who all have root access on all servers. It’s all key-based access. To help mitigate a disaster if one person’s keys were compromised, we’ve implemented monkeysphere on all servers, allowing us to easily revoke access.

After spending so much time thinking through our root-access strategy and fully implementing the monkeysphere to reduce our exposure to a single point of vulnerability, I was disappointed by puppet’s use of a puppet master. For those less familiar with puppet, it goes something like this:

One server (or god forbid multiple servers), run an externally accessible daemon. Each and every server on your network runs a daemon as root that periodically communicates with the puppet master, receives new instructions, and then (again, as root) executes these instructions.

In other words, if your puppet master is compromised, I’m not sure exactly what you would need to do, short of rebuilding every server in your network.

To make matters worse, it seems as though some users generate and store all server ssh keys (private and public) on the puppet master and then push the private keys to their respective nodes. That means an intruder doesn’t need to write to the puppet master, just reading these keys would be enough to compromise all servers in your network.

There must be a better way.

Puppet without masters

After some web-searching, I found a promising thread on the puppet list asking what’s lost without a puppet master. This thread lead to a couple other blogs by people who have worked out a system for using puppet without a master.

It turns out that there are two distinct points of centralization with puppet. One is the puppet master (as described above). In addition, there is a concept called storeconfigs - which allows each node in the network to store information in a central database. For example, one server can store a request for an account to be setup on a backup server. The next time the backup server runs, it checks the storeconfigs, finds the request, and creates the user.

It’s possible to run puppet with storeconfigs but without running a puppet master (that avoids the hassle of running the puppet daemons, while providing the convenience of centralization). For our purposes, however, we decided to forego both the puppet master and storeconfigs. We did not want any form of centralization that would provide an additional point of vulnerability.

As is common with puppet, we are storing our puppet recipes in a git repository. And, we are publishing to a single, canonical git repository on the Internet. On each node, we have two git repositories - one is a bare repo (that we can push to) and the other is a checked out repo (in /etc/puppet) that is read by puppet. The bare repo has a post-update hook the changes into the /etc/puppet directory, pulls in the changes from the bare repository, and runs puppet against the newly checked out files. Therefore, we can apply new puppet recipes to any server on the network with

git push <server>

No daemons: neither a master daemon nor a puppet daemon running on the node using up memory or providing a potential security hole. The git push happens over an ssh connection - since all system administrators already have root-level ssh access on every server - there is no need to grant any additional access above what we already have.

Pushing works great - but with 90 nodes we don’t want to have to push to 90 servers everytime we want a change made. That’s where the canonical git repository comes in. A cron job runs a script on each node once an hour that runs git remote update from /etc/puppet. The script then checks the time stamp on the most recent gpg-signed tag and compares it with the time stamp of the current commit. If the most recent gpg-signed tag is newer, it verifies that the tag came from a list of authorized gpg keys (the very same gpg keys used by the monkeysphere to grant root level ssh access). If the gpg signature of the tag can be properly verified, then the changes are merged and puppet is run on the new recipes.

What about privacy?

One of the benefits of a puppet master setup is that nodes get configuration details on a need-to-know basis. The puppet master doesn’t share the entire puppet repo - only the compiled manifest for the node with which it’s communicating.

Our solution to this problem was to go screaming in the other direction. As you might notice from our support wiki and ticket system, we generally favor transparency. Since we are publishing our entire puppet git repo publicly, there seems little point in trying to hide one node’s configuration details from another node.

That also means each node carries around about 4Mb of extra weight in the form of disk space for the git repo. That seems like a small price to pay for the resource savings of not running a puppetd process all the time.

More differences

As I’ve read the puppet lists, faqs and documentation, I’ve found yet more ways our use of puppet diverges from the norm.

The first is a little thing really - most people seem to store all their node configurations in a single nodes.pp file. I’m not sure why. Fortunately, puppet’s include syntax allows globbing, so we’ve created a directory and gave each server it’s own .pp file. This arrangement makes it much easier to parse the configuration with tools other than puppet (like, Q. How many servers do we have? A. ls | wc -l).

Backup and Nagios monitoring without storeconfigs

More significantly - there are some things we can’t do since we are not using storeconfigs. While many puppet users add a variable, like $nagios = true before including their sshd class (which then causes the sshd class to store a configuration for the nagios server to monitor ssh on the node in question), we were forced to come up with alternatives.

My first solution was to simply list all the servers that needed to be monitored in the server node configuration file for the nagios server. Ditto for the backup servers. This approach, however, proved cumbersome and error prone. When adding a new node, I now have to edit three files instead of one. And, how can I easily tell if all nodes have their nagios and/or backup configurations set? The solution was rather simple - there’s more than one way to store a config for another node. Our nagios server is called jojobe.mayfirst.org and our backup server is luisa.mayfirst.org. A typical node.pp file looks like this:

node pietri.mayfirst.org {
  # node config goes here

}
if ( $fqdn == "jojobe.mayfirst.org ) {
  nagios_montior { "pietri": }
}
if ( $fqdn == "luisa.mayfirst.org ) {
  backup_access { "pietri": }
}

This way all configuration related to pietri stays in a single file.

Host keys and granting access between servers

storeconfigs is commonly used to distribute host ssh keys. Every node that is added to puppet has it’s ssh host key stored centrally and then re-distributed to every other node. That way, you can ssh from node to node without ever getting the ssh fingerprint verification. Avoiding that prompt is particularly important when backing up from one server to another via automated scripts. storeconfigs can additionally be used to copy user’s public ssh keys - thus granting user access between servers.

Our solution to this problem: monkeysphere. Rather than maintaining our own private data store of keys, we publish (and sign) our ssh keys via the web of trust. In addition to server keys, each one of our servers’ root user has an ssh-enabled gpg key (also publicly signed by us). By configuring each server to trust our system administrators’ gpg keys for verifying other keys, we can both avoid the ssh fingerprint manual verification step and we can grant a root user on one server access to another server by simply dropping root@$server.mayfirst.org into an authorized_user_ids file on the target server.

There’s no question - the setup was rather tedious (we’re using runit to maintain an ssh-agent for each root user), however, now that’s in place (and configured via puppet), it’s a breeze to add new servers. The only extra step we have to take is to confirm and sign each new server’s keys. This “extra” step not only allows our servers to verify each other, but also allows our users to verify the servers as well, so it’s hardly an extra step at all.

Shared modules

There’s a vibrant community of third party module developers for puppet. Rather than figure out the intricacies of having puppet configure sshd, for example, you can install a contributed sshd module and then you simply add:

include sshd

And you get a default sshd setup. Many of these modules are fairly well developed, offering the ability to easily customize your setup in a number of different ways.

Unfortunately, most of the modules assume you are using storeconfigs and if you are not, they will either fail to work right or you will get noisy errors. At first, this seemed like a problem. However, as I built our puppet recipes, I found myself increasingly frustrated with the third party modules that we could use.

Configuring servers is hard - and requires constant debugging and trouble shooting. puppet already provides a layer of abstraction between you and the server you are setting up. Given the benefits of puppet, I’m willing to spend the time learning the puppet syntax and asking the rest of our system administrators to do the same. This layer of abstraction is further compounded by our use of git to store the configurations (not a problem if you are git hero - but most of us are already struggling to get a handle on using git). Again, all seems worth it for the pay off.

Now enter the puppet module. In addition to learning puppet syntax (and struggling with git) you now need to understand how the third party module works. With software programming, I typically don’t need or want to learn how a library or class does what it does - that’s the beauty of object-oriented programming: it hides the complexity. But when it comes to configuring the servers that I will be responsible for debugging and maintaining, I really need to know exactly what is happening.

To further compound the problem, I found myself wading through third party module code designed to work on Debian, Ubuntu, CentOS, Redhat, gentoo… and more. We run entirely on Debian - we don’t need any of this extra code. And, once I got rid of all the other operating systems, I was still left with a complex module that allows you to configure software in ways we’ll never need.

In the end, we tore out most of these third party modules and replaced them with file and exec puppet resources that did exactly what we needed them to do. Our code base is now much smaller and simpler.

Not just a whiner

I have a lot more to whine about (like why native resources for things like nagios that are so easily handled with the file resource?).

However - the remarkable thing about puppet is that it’s flexible. Despite some fairly substantial problems with the “typical” use of puppet, the program provides enough flexibility for us to use it in a way that fully meets our needs. After having built my own bash-based set of configuration scripts and deeply exploring puppet, I have a great appreciation for the difficulty of building system configuration software (we considered and rejected cf-engine and chef as not being any better).

And, if you are still not convinced that puppet will work fo you … you might consider a package I learned about after going down the puppet route: slack.