September 2012

Bulk Loading Options for Cassandra

Cassandra's bulk loading interfaces are most useful in two use cases: initial migration or restore from another datastore/cluster and regular ETL. Bulk loading assumes that it's important to you to avoid loading via the thrift interface, for example because you haven't integrated a client library yet or because throughput is critical. 

There are two alternative techniques used for bulk loading into Cassandra: "copy-the-sstables" and sstableloader. Copying the sstables is a filesystem level operation, while sstableloader utilizes Cassandra's internal streaming system. Neither is without disadvantages; the best choice depends on your specific use case. If you are using Counter columnfamilies, neither method has been extensively tested and you are safer writing via thrift.

The key to understanding bulk-loading throughput is that potential throughput depends significantly on the nature of the operation as well as the configuration of source and target clusters and things like number of sstables, sstable size and tolerance to potentially duplicate data. Notably but not significantly, sstableloader in 1.1 is slightly improved over the (freshly re-written) version in 1.0. [1]

Below are good cases for and notable aspects of each strategy.

Copy-the-sstables/"nodetool refresh" can be useful if:

  1. Your target cluster is not running, or if it is running, is not sensitive to latency from bulk loading at "top speed" and associated operations.
  2. You are willing to manually, or have a tool to, de-duplicate sstable names and are willing to figure out where to copy them to in any non copy-all-to-all case. You are willing to run cleanup and/or major compaction understand that some disk space is wasted until you do. [2]
  3. You don't want to deal with the potential failure modes of streaming, which are especially bad in non-LAN deploys including EC2.
  4. You are restoring in a case where RF=N, because you can just copy one node's data to all nodes in the new RF=N cluster and start the cluster without bootstrapping (auto_bootstrap: false in  cassandra.yaml).
  5. The sstables you want to import are a different version than the target cluster currently creates. Example : trying to sstableload -hc- (1.0) sstables into a -hd- (1.1) cluster is reported to not work. [3]
  6. You have your source sstables in something like s3 which can easily parallelize copies to all target nodes. s3<>ec2 is fast and free, close to best case for the inefficiency during copy stage.
  7. You want to increase RF on a running cluster, and are ok with running cleanup and/or major compaction after you do.
  8. You want to restore from a cluster with RF=[x] to a cluster whose RF is the same or smaller and whose size is a multiple of [x]. Example: restoring a 9 node RF=3 cluster to a 3 node RF=3 cluster, you copy 3 source nodes worth of sstables to each target node.

sstableloader/JMX "bulkload" can be useful if:

  1. You have a running target cluster, and want the bulk loading to respect for example streaming throttle limits.
  2. You don't have access to the data directory on your target cluster, and/or JMX to call "refresh" on it.
  3. Your replica placement strategy on the target cluster is so different from the source that the overhead of understanding where to copy sstables to is unacceptable, and/or you don't want to call cleanup on a superset of sstables.
  4. You have limited network bandwidth between the source of sstables and the target(s). In this case, copying a superset of sstables around is especially ineffecient.
  5. Your infrastructure makes it easy to temporarily copy sstables to a set of sstableloader nodes or nodes on which you call "bulkLoad" via JMX. These nodes are either non-cluster-member hosts which are otherwise able to participate in the cluster as a pseudo-member from an access perspective or cluster members with sufficient headroom to bulkload. 
  6. You can tolerate the potential data duplication and/or operational complexity which results from the fragility of streaming. LAN is best case here. A notable difference between "bulkLoad" and sstableloader is that "bulkLoad" does not have sstableloader's "--ignores" option, which means you can't tell it to ignore replica targets on failure. [4]
  7. You understand that, because it uses streaming, streams on a per-sstable basis, and streaming respects a throughput cap, your performance is bounded in terms of ability to parallelize or burst, despite "bulk" loading.

Jetpants Usage and Installation Gotchas

If you regularly manage enormous data sets, you've probably heard about Tumblr's exciting new toolkit called Jetpants, which automates common processes for MySQL data management, most notably in the area of rebalancing shards for more efficient scaling. In evaluating and using Jetpants in various environments, we identified some interesting installation and usage gotchas, and we've documented those here - let us know if you find other issues worth exploring.  

Jetpants Installation/Usage Gotchas for Ubuntu 12.04:

Jetpants has to-date only been tested on RHEL/CentOS distributions, though working with the author (Evan Elias of Tumblr, thank you much!) I was able to get it running on Ubuntu 12.04 and Mint Maya/13. There will be more work to do, since RHEL/CentOS report back a service status ("service mysql status" output, see later in this post) differently than Ubuntu/Mint, and there may be other differences as well.

For your Jetpants Console workstation, install Ruby 1.9.3 (also 1.9.1 will be installed), and check /etc/alternatives/ruby points at ruby1.9.3.  Also be sure you installed the MySQL client libraries.

# apt-get install ruby1.9.3 rubygems libmysqlclient-dev
# ln -sf /usr/bin/ruby1.9.3 /etc/alternatives/ruby
# ln -sf /usr/bin/gem1.9.3 /etc/alternatives/gem

After that, do gem install Jetpants. You may get scary warnings about inability to convert ASCII to UTF8 characters during documentation installation as below:

# gem install jetpants

Building native extensions.  This could take a while...
Fetching: sequel-3.39.0.gem (100%)
Fetching: net-ssh-2.5.2.gem (100%)
Fetching: coderay-1.0.7.gem (100%)
Fetching: slop-3.3.3.gem (100%)
Fetching: method_source-0.8.gem (100%)
Fetching: pry-0.9.10.gem (100%)
Fetching: highline-1.6.15.gem (100%)
Fetching: colored-1.2.gem (100%)
Fetching: jetpants-0.7.4.gem (100%)
Successfully installed mysql2-0.3.11
Successfully installed sequel-3.39.0
Successfully installed net-ssh-2.5.2
Successfully installed coderay-1.0.7
Successfully installed slop-3.3.3
Successfully installed method_source-0.8
Successfully installed pry-0.9.10
Successfully installed highline-1.6.15
Successfully installed colored-1.2
Successfully installed jetpants-0.7.4
10 gems installed
Installing ri documentation for mysql2-0.3.11...
Installing ri documentation for sequel-3.39.0...
Installing ri documentation for net-ssh-2.5.2...
unable to convert "\xE7" from ASCII-8BIT to UTF-8 for lib/net/ssh/authentication/pageant.rb, skipping
Installing ri documentation for coderay-1.0.7...
Installing ri documentation for slop-3.3.3...
Installing ri documentation for method_source-0.8...
Installing ri documentation for pry-0.9.10...
Installing ri documentation for highline-1.6.15...
Installing ri documentation for colored-1.2...
Installing ri documentation for jetpants-0.7.4...
Installing RDoc documentation for mysql2-0.3.11...
Installing RDoc documentation for sequel-3.39.0...
Installing RDoc documentation for net-ssh-2.5.2...
unable to convert "\xE7" from ASCII-8BIT to UTF-8 for lib/net/ssh/authentication/pageant.rb, skipping
Installing RDoc documentation for coderay-1.0.7...
Installing RDoc documentation for slop-3.3.3...
Installing RDoc documentation for method_source-0.8...
Installing RDoc documentation for pry-0.9.10...
Installing RDoc documentation for highline-1.6.15...
Installing RDoc documentation for colored-1.2...
Installing RDoc documentation for jetpants-0.7.4...

Be sure you've created /var/www writeable by the user you'll execute Jetpants as, and be sure you have an /etc/jetpants.yaml that is similar to the template provided on the Jetpants website.

And now you're ready to run Jetpants!

$ jetpants 

Tasks:

  jetpants activate_slave            # turn a standby slave into an active slave

  jetpants clone_slave               # clone a standby slave

  jetpants console                   # Jetpants interactive console

  jetpants destroy_slave             # remove a standby slave from its pool

  jetpants help [TASK]               # Describe available tasks or one specific task

  jetpants pools                     # display a full summary of every pool in the topology

  jetpants pools_compact             # display a compact summary (master, name, and size) of every pool in the topology

  jetpants promotion                 # perform a master promotion, changing which node is the master of a pool

  jetpants pull_slave                # turn an active slave into a standby slave

  jetpants rebuild_slave             # export and re-import data set on a standby slave

  jetpants regen_config              # regenerate the application configuration

  jetpants shard_cutover             # truncate the current last shard range, and add a new shard after it

  jetpants shard_offline             # mark a shard as offline (not readable or writable)

  jetpants shard_online              # mark a shard as fully online (readable and writable)

  jetpants shard_read_only           # mark a shard as read-only

  jetpants shard_split               # shard split step 1 of 4: spin up child pools with different portions of data set

  jetpants shard_split_child_reads   # shard split step 2 of 4: move reads to child shards

  jetpants shard_split_child_writes  # shard split step 3 of 4: move writes to child shards

  jetpants shard_split_cleanup       # shard split step 4 of 4: clean up data that replicated to wrong shard

  jetpants summary                   # display information about a node in the context of its pool

  jetpants weigh_slave               # change the weight of an active slave

As mentioned before, Jetpants is currently only well-tested for RHEL/CentOS distributions. RHEL/CentOS report back a service status ("service mysql status" output) differently than Ubuntu/Mint. Following is the requirements of Jetpants and actual output:

/sbin/service mysql status # check if mysql daemon running; output must include the string 'not running' if mysql is not running

This is how it works on both Ubuntu 12.04 and Mint Maya/13:

# service mysql stop
mysql stop/waiting

# service mysql status
mysql stop/waiting
# service mysql start
mysql start/running, process 31346
# service mysql status
mysql start/running, process 31346

I decided to check the return code, but it's always 0 regardless of the service state, so we must parse the output in Jetpants. See the following output:

# service mysql status ; echo $?
mysql stop/waiting
0
# service mysql start
mysql start/running, process 31629
# service mysql status ; echo $?
mysql start/running, process 31629
0

To support Debian-alikes, Jetpants must check for the other string, since the strings being output by "service" makes it impossible to check for a single substring across distros.

Over the coming days, I will be identifying other CentOS/RHEL-specific code and writing patches to make it work with at least also Ubuntu 12.04. In the future, if you're trying to port Jetpants to another distribution, you'll be able to use my pull request as a template for the areas that are distribution-specific.

When running Jetpants in EC2, there are some more gotchas (or perhaps they're just "be sure to follow best-practices"):

1. You must allow root login to your cluster. Generate a privkey/pubkey pair and allow root logins. 

      a. Edit ~root/.ssh/authorized_keys and get rid of the part where it has the "command=..." that prints an error and logs you out.

     b. Edit /etc/cloud/cloud.cfg and set disable_root: 0

     c.  Edit /etc/ssh/sshd_config and set PermitRootLogin without-password

2. You must be sure your master/slaves replication topology uses only IP addresses. If your slaves have, eg "Master_Host: ec2-5-2-3-8.compute.amazonaws.com" then Jetpants will fail. IP addressses is how you want your cluster configured in production, so do it, but it may surprise you if you're trying to build a small test cluster.

3. You must allow root login from localhost on all your MySQL instances. In other words, this command must return valid output on your slaves: mysql -e 'show slave status\G' - note that you can put root login credentials in ~root/.my.cnf rather than allow passwordless MySQL root logins.

In general, it's good to remember that Jetpants is a set of utilities for controlling an actual large MySQL cluster, so it will not work so well for tiny little test deploys where you've cut corners, and this is how it should be. Here are more Jetpants Environment Requirements/Limitations, from the documentation:

1. Need pigz installed on your cluster. If you use the Palomino Cluster Tool to build your shards, pigz will be installed for you.

2. No auto_increment on sharded tables. Must have ID generator.

3. PK of all sharded tables should start with the shard key. This will vastly improve performance of long-running processes like shard splits.

4. Port :3306 everywhere. One instance per node.

5. (Probably) doesn't work with MyISAM.

Couchbase rebalance freeze issue

We came across a Couchbase bug during a rebalance while upgrading online to 1.8.1 from 1.8.0.  

Via the UI, we upgraded our first node, re-added it to the cluster, and then set the rebalance off.  It was progressing fine, then stopped around 48% for all nodes.  The tap and disk queues were quiet and there were no servers in pending rebalance.  The upgraded node was able to service requests, but with only a small percentage of the items relative to the other nodes.  The cluster as a whole did not suffer in performance during this issue though there are some spikes in cpu during any rebalance.  

We decided to stop the rebalance, wait a few minutes, then rebalance and we see it is moving again, progressing beyond what it was.  It stopped again, now at 75%. Let sit for 7 mins, then hit Stop Rebalance and Rebalance. Not progressing at all now.

Couchbase support pointed to a bug where if there are empty vbuckets, rebalancing can hang.  This is fixed in 2.0.  The work around solution is to populate buckets with a minimum of 2048 short time to live (TTL >= (10 minutes per upgrade + (2 x rebalance_time)) x num_nodes) items so all vbuckets have something in them.  We then populated all buckets successfully and were able to restart the rebalance process which completed fine.

Reference:

http://www.couchbase.com/docs/couchbase-manual-1.8/couchbase-getting-started-upgrade-online.html

Syndicate content
Website by Digital Loom