Chef Cookbooks for HBase on CentOS Released
At Palomino, we've been hard at work building the Palomino Cluster Tool. Its goal is to let you build realistically-sized[1] and functionally-configured[2] distributed databases in a matter of hours instead of days or weeks as it is at present. Today marks another milestone toward that goal as we release our Chef Cookbook for building HBase on CentOS!
Background
Riot Games was kind enough to open source their Chef Cookbook for building a Hadoop cluster. Although the code wasn't in a state that would produce a functional cluster, and it was almost entirely undocumented, it was a great start.
Recently I was tasked with building an HBase cluster on CentOS using Chef. Although I've written a Cookbook (three times!) to do so, my code was never fully optimized. It could build a cluster, but only with hard-coded configuration parameters, or it produced a cluster that was running in a non-realistic non-production configuration.
Using the Riot Games Cookbook and the lessons I'd learned in the past, I whipped it into shape. I not only modified it to produce a functional cluster in a non-Riot environment, but also to build HBase on top of that! There are over 800 changes in the diff and documentation on how to use it.
Source Code
Here you can find the newest Chef Cookbook for HBase on CentOS. Here you can find the original Ansible Playbooks for HBase on Ubuntu. If you would like to use this code to build your own cluster, you are encouraged to join the mailing list to get help and advice from your peers.
Notes
[1] A distributed database can be tested functionally by installing on a single machine, but when it comes time to run benchmarks, or to discover the other 90% of functionality that only appears in a distributed setup, you will want to have the database installed on many machines, preferably dozens.
[2] Many projects seem to stop short of installing the database in a way that would let you benchmark it. Perhaps there are shortcuts taken like putting all database files into /tmp, or disabling logging, or removing tricky/subtle components in the interest of simplicity. The Palomino Cluster Tool provides you with a cluster that's actually ready for production. Sure, you still have to edit the configurations a little, but a good base generic configuration is provided.
Archives
- May 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- November 2009
- March 2008
- November 2007
- October 2007


Comments
This was very helpful indeed. I would love to learn more about CentOS.
Nice, thanks. You use both Ansible and Chef and have recommended Ansible (July 31). I would be curious to hear more about experiences with it.
Reply