Wednesday, February 20, 2013

Making chef-client safer

One of main issues we are facing when using chef in our environment is that a configuration file generated by either chef template, cookbook_file or file may exists in incomplete state during short period of time when chef-client modifying it. Processes accessing the configuration file during this period may fail strangely leaving no clear trace.

Let look at a very simple hypothetical example. We create chef template that generates /etc/hosts using data specified in role. At one time we add more machines into this /etc/hosts. What happens behind scene is that chef-client create new temporary file with required content, compare checksum of it with one of actual /etc/hosts and overwrite the /etc/hosts if it is not equals to the recent generated temporary file. During period of overwriting, any processes may see incomplete /etc/hosts, thus may not be able to resolve a hostname even though it already exists in the original file.

Luckily there is a fix for this problem. We can monkey patch chef template, cookbook_file and file providers to follow the well known pattern, create and write to temporary file then use File.rename to rename the temporary file to the final file.

The ruby File.rename uses syscall rename, which guarantees that any processes accessing the configuration file at any point of time always see either complete old or complete new version of the file.

The File.rename requires both old name and new name being in the same mounted filesystem. So oneway to make sure that File.rename will success is to create temporary file in the same directory as one of an overwritten file, which can archive easily by passing the directory of the overwritten file as tmpdir to Tempfile.new(basename, [tmpdir = Dir.tmpdir], options).