Revision controlling system configuration

Many system administrator are using a revision control tool to track configuration files. I'm no exception. There are various howtos out there. But most don't go beyound the absolute basics. This document isn't yet another HOWTO. It is a compilation of a few weblogs I wrote. They are a by-product of my work on IsiSetup. IsiSetup is a set of scripts providing a revision control interface designed to handle configuration files. This document isn't solely about IsiSetup, but uses it as an example where appropriate.

SCM for /etc? What's in it for me?

Here's the shortlist:

What's needed to manage /etc?

When I began using an SCM to manage my hosts /etc CVS was the most famous free revision control tool. It worked, but was the opposite of a perfect fit. In the meantime, we’ve seen the growth of a diverse SCM-biosphere. It manifests not only in the quantity of alternatives, but the new competitors bring new qualities, too.

Darcs has long been the SCM of choice for IsiSetup. Darcs has a very nice command line interface and provides some unique interactive features. It is also truly changeset based and distributed. During the development more and more code was needed to work around shortcommings, leading to a reconsideration. In the following sections I'll lay out my thoughts on how modern SCMs may handle your system configuration.

Distributed vs. Centralized

While CVS and Subversion are centralized by design, most new SCMs can be used in a distributed environment. In CVS and Subversion there is a clear distinction between the repository and working copies. Every change in a working copy has to be submitted to the central repository server, from where others are able to pull them. In CVS this dependency was so strong that almost any command needed a connection, offline working was nearly impossible. While this centralization is good to coordinate a few persons working on a common project, it doesn’t fit when many groups are working on dedicated branches. A distributed setup also needs much less infrastructure, as no central repository is needed, but everything can be done only using the working directory.

When managing /etc, the distributed nature of todays SCMs comes very handy. It allows to just initialize /etc and start tracking changes. No central repository setup needed. Its also very easy to replicate such changes between hosts, as you may pull changes from the /etc on one host to the /etc on another. One thing you must be aware is that every tracked file is represented in the SCMs repository directory (called .git, _darcs etc.). You need to protect this directory (chmod 600) in order not to leak the content of otherwise protected files!

Branching and Merging

In CVS, branching was only for the enlighted. It was hard to do and merging was a pain in the ass. SVN somewhat changed this, but only when the distinction between a branch and a working directory was removed by Darcs and others, branching got an everyday thing. In Darcs, Git etc. working directories are full featured repositories. They contain the whole history. It’s because of this feature that every working directory can pull in changes from any other. It also allows to branch by simply copying a working directory.

While branching and merging isn’t needed when simply revision controlling /etc, it gives the whole exercise a new dimension. You may not only track changes host-by-host, but you may be able to group changes into modules that implement some kind of service or feature.

Changesets vs. Revisions

Most SCMs use some kind of revision number to track the history of changes. Every revision describes the state of the project at a specific point in time. Each commited change enhances the revision number. There is a clear timeline. This system seems intuitive and simple. But it doesn’t fit well into a distributed design, where different branches can move forward independendly and be merged back later. While it is possible to handle this case, as shown by Subversion, a radical different approach is implemented in e.g. Darcs: A revision is not identified by a number, but simply by the set of changes. This allows to cherry pick some changes while ignoring others when pulling in changes. You may want to learn more about the underlying theory of patches.

In software development most changes are here to stay. They come one after the other and do contribute small features or bugfixes to a single application. You mostly aren’t interested in splitting the project into many, and you don’t want to be able to reapply changes in a different order. The changes belong together and depend on previous ones.

The changes applied to /etc are of a different nature. They may well depend on each other but can often be grouped into features or services. These changesets (features’n’services) don’t need to be applied in any particular order. You’d like to just choose a changeset and apply it to a given host.

User interface

The CVS legacy

I’ve used quite a few SCM implementations. When CVS was the only viable tool, we hated it’s interface, but weren’t able to dismiss it… Things have changed to the better. There’s healthy competition, nowadays.

Subversion sees itself as a successor to CVS. It tries to build on the experience developers and admins build with CVS. The command line mimmicks CVS’. As a dedication to CVS’ legacy, IsiSetup provides the following CVS style command aliases: add, remove, import, commit, update, diff, history, log.

Darcs

Darcs was the first distributed, changeset based SCM I got to now. It was also the tools of choice in early IsiSetup versions. But some restrictions (mainly performance, missing link and permission tracking support and the fact that some commands couldn’t be run non-interactive) made the switch to Git/Cogito necessary.

But I’m still using Darcs to manage the source code of IsiSetup. Now, what exactly do I like about Darcs command line interface?

And here’s the list of commands supported by IsiSetup: initialize, add, remove, mv, whatsnew, record, rollback, changes, pull, and get.

Git/Cogito

Git is promoted as providing only a low-level SCM command line interface. While Git comes with a few commands which let you manage your sources/configs, Git isn’t designed to give a userfriendly experience. In Git-speak, this is left to a porcelain. One such wrapper is Cogito. IsiSetup is another one.

IsiSetup

IsiSetup is designed with configuration files in mind. This is what it seperates from other revision control tools. But this post is about the classic, low-level SCM operations. These are provided by the isisetup-module script. Interactivity

The feature I miss most when working with one of Darcs’ competitors is the interactivity of recording commits, moving around patches and dropping changesets. I’ll try to provide some interactivity with IsiSetup, too.

Here's an interactive session of isisetup-module rollback as an example:

shuerlimann@lappi:/etc$ sudo isisetup-module rollback
Author: root
Date:   Sun Jun 11 16:50:29 2006 +0200

    Mark package 'exim4-base' to install

Do you want to rollback this change? [y/N]n

Author: root
Date:   Sun Jun 11 16:50:29 2006 +0200

    Imported package configuration version 4.60-3ubuntu3 for 'exim4-base'

Do you want to rollback this change? [y/N]y
First trying simple merge strategy to revert.
Simple revert fails; trying Automatic revert.
Removing cron.daily/exim4-base
Removing init.d/exim4
Removing logrotate.d/exim4-base
Finished one revert.
shuerlimann@lappi:/etc$ isisetup-module history
Author: root
Date:   Sun Jun 11 17:53:46 2006 +0200

    Revert "Imported package configuration version 4.60-3ubuntu3 for 'exim4-base'"

    This reverts 0be946f6be2f5b79574a6dafe1eb9ef2abd7b6d0 commit.

    000000 D    cron.daily/exim4-base
    000000 D    init.d/exim4
    000000 D    logrotate.d/exim4-base

Merging

If a config file is managed by more than one module, conflicts can arise. Different RCS provide different merging capabilities. IsiSetup started using Darcs as the underlying SCM. I had to switch to Git not least because Darcs doesn't provide adequate merging capabilities.

Context matters

Merging gets the better the more context is understand by the merger. This is true for both humans and computers working in the merging business. In an ideal world there would be a merging tool which is aware of things like

...and this in many configuration formats.

Conf4GNU & Co

Configuration frameworks like Config4GNU and UniConf may could help. They provide translation layers between configuration formats. It may be thus possible to select a format for which efficient mergers are available. This might be XML, INI-style, ISC-style etc.

The control flow in a merger would be something like:

Sounds easy, might be hell to implement...

Our choice

Why Git/Cogito?

Some features:

Advantages over competitors:

Shortcommings:

Why not Darcs?

The main reasons against Darcs are:

Ressources

Simon Huerlimann has written a few articles about SCMs and their use in IsiSetup. This site is mostly a compilation.

Others have blogged, too:

Other SCMs

There's a nice comparison at the Better SCM website. It compares the feature sets of more than a dozen SCM!

LogintasPublicWiki: IsiSetupRevisionControl (last edited 2010-04-01 14:58:00 by localhost)