Revision controlling system configuration
Many system administrator are using a revision control tool to track configuration files. I'm no exception. There are various howtos out there. But most don't go beyound the absolute basics. This document isn't yet another HOWTO. It is a compilation of a few weblogs I wrote. They are a by-product of my work on IsiSetup. IsiSetup is a set of scripts providing a revision control interface designed to handle configuration files. This document isn't solely about IsiSetup, but uses it as an example where appropriate.
SCM for /etc? What's in it for me?
Here's the shortlist:
You can rollback changes.
You can explore the history of changes.
You can replicate your configuration.
You can backup your configuration.
You can blame changes to admins.
What's needed to manage /etc?
When I began using an SCM to manage my hosts /etc CVS was the most famous free revision control tool. It worked, but was the opposite of a perfect fit. In the meantime, we’ve seen the growth of a diverse SCM-biosphere. It manifests not only in the quantity of alternatives, but the new competitors bring new qualities, too.
Darcs has long been the SCM of choice for IsiSetup. Darcs has a very nice command line interface and provides some unique interactive features. It is also truly changeset based and distributed. During the development more and more code was needed to work around shortcommings, leading to a reconsideration. In the following sections I'll lay out my thoughts on how modern SCMs may handle your system configuration.
Distributed vs. Centralized
While CVS and Subversion are centralized by design, most new SCMs can be used in a distributed environment. In CVS and Subversion there is a clear distinction between the repository and working copies. Every change in a working copy has to be submitted to the central repository server, from where others are able to pull them. In CVS this dependency was so strong that almost any command needed a connection, offline working was nearly impossible. While this centralization is good to coordinate a few persons working on a common project, it doesn’t fit when many groups are working on dedicated branches. A distributed setup also needs much less infrastructure, as no central repository is needed, but everything can be done only using the working directory.
When managing /etc, the distributed nature of todays SCMs comes very handy. It allows to just initialize /etc and start tracking changes. No central repository setup needed. Its also very easy to replicate such changes between hosts, as you may pull changes from the /etc on one host to the /etc on another. One thing you must be aware is that every tracked file is represented in the SCMs repository directory (called .git, _darcs etc.). You need to protect this directory (chmod 600) in order not to leak the content of otherwise protected files!
Branching and Merging
In CVS, branching was only for the enlighted. It was hard to do and merging was a pain in the ass. SVN somewhat changed this, but only when the distinction between a branch and a working directory was removed by Darcs and others, branching got an everyday thing. In Darcs, Git etc. working directories are full featured repositories. They contain the whole history. It’s because of this feature that every working directory can pull in changes from any other. It also allows to branch by simply copying a working directory.
While branching and merging isn’t needed when simply revision controlling /etc, it gives the whole exercise a new dimension. You may not only track changes host-by-host, but you may be able to group changes into modules that implement some kind of service or feature.
Changesets vs. Revisions
Most SCMs use some kind of revision number to track the history of changes. Every revision describes the state of the project at a specific point in time. Each commited change enhances the revision number. There is a clear timeline. This system seems intuitive and simple. But it doesn’t fit well into a distributed design, where different branches can move forward independendly and be merged back later. While it is possible to handle this case, as shown by Subversion, a radical different approach is implemented in e.g. Darcs: A revision is not identified by a number, but simply by the set of changes. This allows to cherry pick some changes while ignoring others when pulling in changes. You may want to learn more about the underlying theory of patches.
In software development most changes are here to stay. They come one after the other and do contribute small features or bugfixes to a single application. You mostly aren’t interested in splitting the project into many, and you don’t want to be able to reapply changes in a different order. The changes belong together and depend on previous ones.
The changes applied to /etc are of a different nature. They may well depend on each other but can often be grouped into features or services. These changesets (features’n’services) don’t need to be applied in any particular order. You’d like to just choose a changeset and apply it to a given host.
User interface
The CVS legacy
I’ve used quite a few SCM implementations. When CVS was the only viable tool, we hated it’s interface, but weren’t able to dismiss it… Things have changed to the better. There’s healthy competition, nowadays.
Subversion sees itself as a successor to CVS. It tries to build on the experience developers and admins build with CVS. The command line mimmicks CVS’. As a dedication to CVS’ legacy, IsiSetup provides the following CVS style command aliases: add, remove, import, commit, update, diff, history, log.
Darcs
Darcs was the first distributed, changeset based SCM I got to now. It was also the tools of choice in early IsiSetup versions. But some restrictions (mainly performance, missing link and permission tracking support and the fact that some commands couldn’t be run non-interactive) made the switch to Git/Cogito necessary.
But I’m still using Darcs to manage the source code of IsiSetup. Now, what exactly do I like about Darcs command line interface?
It makes me feel good
The command set feels coherent
For most actions, there exists an ‘undo’ version
It is interactive where it makes sense
And here’s the list of commands supported by IsiSetup: initialize, add, remove, mv, whatsnew, record, rollback, changes, pull, and get.
Git/Cogito
Git is promoted as providing only a low-level SCM command line interface. While Git comes with a few commands which let you manage your sources/configs, Git isn’t designed to give a userfriendly experience. In Git-speak, this is left to a porcelain. One such wrapper is Cogito. IsiSetup is another one.
IsiSetup
IsiSetup is designed with configuration files in mind. This is what it seperates from other revision control tools. But this post is about the classic, low-level SCM operations. These are provided by the isisetup-module script. Interactivity
The feature I miss most when working with one of Darcs’ competitors is the interactivity of recording commits, moving around patches and dropping changesets. I’ll try to provide some interactivity with IsiSetup, too.
Here's an interactive session of isisetup-module rollback as an example:
shuerlimann@lappi:/etc$ sudo isisetup-module rollback
Author: root
Date: Sun Jun 11 16:50:29 2006 +0200
Mark package 'exim4-base' to install
Do you want to rollback this change? [y/N]n
Author: root
Date: Sun Jun 11 16:50:29 2006 +0200
Imported package configuration version 4.60-3ubuntu3 for 'exim4-base'
Do you want to rollback this change? [y/N]y
First trying simple merge strategy to revert.
Simple revert fails; trying Automatic revert.
Removing cron.daily/exim4-base
Removing init.d/exim4
Removing logrotate.d/exim4-base
Finished one revert.
shuerlimann@lappi:/etc$ isisetup-module history
Author: root
Date: Sun Jun 11 17:53:46 2006 +0200
Revert "Imported package configuration version 4.60-3ubuntu3 for 'exim4-base'"
This reverts 0be946f6be2f5b79574a6dafe1eb9ef2abd7b6d0 commit.
000000 D cron.daily/exim4-base
000000 D init.d/exim4
000000 D logrotate.d/exim4-base
Merging
If a config file is managed by more than one module, conflicts can arise. Different RCS provide different merging capabilities. IsiSetup started using Darcs as the underlying SCM. I had to switch to Git not least because Darcs doesn't provide adequate merging capabilities.
Context matters
Merging gets the better the more context is understand by the merger. This is true for both humans and computers working in the merging business. In an ideal world there would be a merging tool which is aware of things like
comments,
variable assignements,
variable groups,
cardinality...
...and this in many configuration formats.
Conf4GNU & Co
Configuration frameworks like Config4GNU and UniConf may could help. They provide translation layers between configuration formats. It may be thus possible to select a format for which efficient mergers are available. This might be XML, INI-style, ISC-style etc.
The control flow in a merger would be something like:
- Translate current configs into mergeable format
- Translate new configs into mergeable format
- Merge current and new config
- Translate result back into native formats
Sounds easy, might be hell to implement...
Our choice
Why Git/Cogito?
Some features:
- Repositories can be HTTP, local filesystem, rsync, SSH or a GIT specific
- Every working directory is a repository, too
- Replication of branches are simple copies
- Commits can be merged between branches
- Changes can be pushed or pulled between branches
- Good performance
- Distributed
- Links are tracked
- Executable bits are tracked
- Good merge capabilities
- Only one, top level .git directory
Advantages over competitors:
- gitk can be used to visualize branch trees
- Commits and tags can be signed
- on performace
Shortcommings:
- Not all permissions are tracked
- There's no --quiet argument
- Empty directories are not tracked
Why not Darcs?
The main reasons against Darcs are:
- Bad performance
- Endless loops under some scenarios
- Links are not tracked
- File permissions are not tracked
- There's no --quiet argument
- Merge capabilities do not match our needs
Ressources
Simon Huerlimann has written a few articles about SCMs and their use in IsiSetup. This site is mostly a compilation.
Others have blogged, too:
- []
Other SCMs
There's a nice comparison at the Better SCM website. It compares the feature sets of more than a dozen SCM!