Wednesday, August 10, 2011

Craft a solid chef-server

The out of box chef server installation is NOT enough to support large scale deployment . To handle hundreds of nodes we need to craft it a bit. I highlight here some configuration tweaks
1. Run multi instances with Apache or Nginx http server as front end.
chef-server-api and chef-server-webui is single process, the only way to scale them are run multi instances of them behind the front end reverse http proxy server.
Also the front end http server can be configured to perform ssl encryption making communication between nodes and chef-server-api secure. Note that chef-server-webui is also chef-client, so use URL of the reverse proxy server to communicate with chef-server-api.
2. Use ruby 1.9
chef server components are hungry to CPU, use ruby 1.9 can boost performance as much as twice comparing to ruby 1.8.
3. Couchdb and big hard disk
Avoid un necessary headache by installing latest version of Couchdb. The version that comes with OS is usually old, buggy, often crashes under high load with hundred GB database. Hard disk is cheap, keep space for Couchdb data files, log files always free around 30%.
Also learn Erlang a bit so you can look and understand crash dump file from Couchdb as well as change its configuration.
4. Solr server
Chef-solr is de facto Solr server for full text search, The default config is pretty small for any serious operation. As it is java application based on Lucene, we need increase java heap size. You also need modify the Solr config (see this tip) to make full text index run faster. Believe me , you will have to rebuild solr index more than you think and proper solr config will save days of frustrated waiting.
5. Rabbitmq-server
Follow the same advice as for Couchdb. Rabbitmq-server is also Erlang application. It need enough memory to work reliably. Give it few GB of RAM if you do not want it crash and then will have to rebuild solr index.
6. Not enough 
Distribute load by running chef-client at different deterministic time avoiding run chef-client of all nodes at the same time.
Consider running each of chef-server component on separate box even one component in many boxes. But don't do it if you have not try all above options as the more complex configuration, the more time you have to spend on it and you end up to serve the chef instead of it serves your infrastructure.
7. Put chef-server configuration in a cookbook
Create your own cookbook to automate all the changes you make on chef-server so you can recreate exact new chef-server using chef-solo in matter of minute (when needed). At the end, who will believe that you will be able to automate the infrastructure if you can not automate your own stuff.