One of big headache in implementation Chef or any automate configuration management tool is migration. We usually have to face with a large legacy configuration that is poorly designed and implemented, which make thing worse.
I want to share some of our own experiences in doing migration of a relatively large configuration in term of both scale and complexity.
The tools and techniques
They are
1. a decent, fast and rock solid version control system: I recommend git.
2. semantic diff tools: diff of text file is OK, but not enough, depending on format of your configuration (XML, properties, yaml, JSON), you may need to write your own semantic diff, that is able to make intelligent diff of these formats.
3. source code of these applications that use these configuration: the migration involve legacy applications, access to source code is absolute need to order to do a refactoring.
4. collaboration: with the new tool developers need to change/cleanup the code that access configuration data, operation people need to change their practices, we all need to learn how to use, what to trust, where we need to pay attention.
Shadow configuration file
In order to minimize impact existing system, at the beginning we let chef to create shadow configurations, which then are verified with existing configuration using diff tools. The shadow configuration generated by chef will replace the actual one only after we make sure that they are semantically the same from the point of view of the program using it.
In shadow configuration we can have a configuration file with different prefix/postfix or different directory. e.g. if the actual configuration file is
/etc/hosts
then the shadow one will be
/etc/hosts.chef
or
/etc-chef/hosts
or
/root-chef/etc/hosts
. I personally prefer the complete shadow directory because it is easier to check, remove when needed.
The cookbook need to have one attribute (can be name of directory where we are going to generate configuration files) saying if we are going to generate shadow or actual configuration.
Conflicting changes
In ideal world, we just has one configuration management system and one administrator who modify the system. But reality is different. There are usually more actors that make the change. So there is potential conflict. E.g. Chef modify one file and then an administrator or scripts or other configuration system don't know and also modify it. The worst case is that everyone assume that the file has the content he expect and thing get broken. These scripts that work for many years suddenly fails.
How to minimize the conflict?. Communicate well with other team members what are managed by Chef, which they should modify through Chef and not manually or by other tools. Various methods can be used e.g. a) wiki pages, b) put a clear notice as comment at the beginning of all files managed by Chef such as "This file is generated by Chef, you shall not modify it as it will be overwritten" c) notify all people each time a file is modified by Chef.
Top-down vs. bottom-up
There is basically two approaches to start the migration project 1) top-down and 2) bottom-up.
1. In top-down approach, you start with create a cookbook/recipe for one configuration item (e.g. syslog-ng, the cookbook will be fairly generic so it is usable by all environments) . Then extract specific data (e.g. ip of syslog server, source of log) from configuration file from each server and put them as attributes into a role. After finishing one configuration item, you continue with other until you have all you need in Chef.
2. With bottom-up approach, we first create one generic cookbook/recipe per server role (e.g. middleware) then you add all configuration files from servers of all environments to it. The structure of cookbook files can be like
middleware/files/default # files that are same for all hosts of all env.
middleware/files/default/dev # files that are same for all hosts of development env.
middleware/files/default/dev/host1 # files that are specific for host1 of evelopment env.
middleware/files/default/dev/host2
middleware/files/default/qa
middleware/files/default/pro
The recipe can be easily developed to copy these configuration files to a server depending on environment and hostname (node[:hostname]). After everything are in Chef and work well, we will start refactor and split this big generic cookbook to many smaller cookbooks, adding more roles and make thing easily to reuse. This step is kind of continual improvement process aiming at making our chef repository better and better.
In our project we have taken the first approach because it seems more naturally at first sight, however later we change to the second one, which I think have many benefits.
With the second approach, after we put all things in the Chef (can be done pretty quickly), we can announce that from now all changes has to be done through Chef. The very first result is seen immediately: a) there is single point of making change b) change is well audited and communicated as Chef repository is under version control system c)conflicting changes is minimized