Benchmarking NDB vs Galera

Inspired by the benchmark in this post, we decided to run some NDB vs Galera benchmarks for ourselves.

We confirmed that NDB does not perform well using m1.large instances. In fact, it’s totally unacceptable -  no setup should ever have a minimum latency of 220ms - so m1.large instances are not an option. Apparently the instances get CPU bound, but CPU utilization never goes above ~50%. Maybe top/vmstat can’t be trusted in this virtualized environment?

So, why not use m1.xlarge instances? This sounds like a better plan!

As in the original post, our dataset is 15 tables of 2M rows each, created with:

./sysbench --test=tests/db/oltp.lua --oltp-tables-count=15 --oltp-table-size=2000000 --mysql-table-engine=ndbcluster --mysql-user=user --mysql-host=host1 prepare

Benchmark against NDB was executed with:

for i in 8 16 32 64 128 256

do

./sysbench --report-interval=30 --test=tests/db/oltp.lua --oltp-tables-count=15 --oltp-table-size=2000000 --rand-init=on --oltp-read-only=off --rand-type=uniform --max-requests=0 --mysql-user=user --mysql-port=3306  --mysql-host=host1,host2 --mysql-table-engine=ndbcluster --max-time=600 --num-threads=$i run > ndb_2_nodes_$i.txt

done

After we shutdown NDB, we started Galera and recreated the table, but found that running sysbench was failing. A suggestion from Hingo was to use --oltp-auto-inc=off, which worked.

Our benchmark against NDB was executed with:

for i in 8 16 32 64 128 256

do

./sysbench --report-interval=30 --test=tests/db/oltp.lua --oltp-tables-count=15 --oltp-table-size=2000000 --rand-init=on --oltp-read-only=off --rand-type=uniform --max-requests=0 --mysql-user=user --mysql-port=3306  --mysql-host=host1,host2 --mysql-table-engine=ndbcluster --max-time=600 --num-threads=$i --oltp-auto-inc=off run > galera_2_nodes_$i.txt

done

Below are the graphs of average throughput at the end of 10 minutes, and 95% response time.

 

 

 

 

Galera clearly performs better than NDB with 2 instances!

But things become very interesting when we graph the reports generated every 10 seconds.

 

 

 

 

 

Surprised, right? What is that?

Here we see that even if the workload fits completely in the buffer pool, the high number of TPS causes aggressive flushing.

We assume the benchmark in the Galera blog post was CPU bound, while in our benchmark the behavior is I/O bound.

We then added another 2 more nodes (m1.xlarge instances), but kept the dataset at 15 tables x 2M rows , and re-ran the benchmark with NDB and Galera. Performance on Galera gets stuck, due to I/O. Actually, with Galera, we found that performance on 4 nodes was worse than with 2 nodes; we assume this is caused by the fact that the whole cluster goes at the speed of the slower node.

Performance on NDB keeps growing as new nodes are added, so we added another 2 nodes for just NDB (6 nodes total).

 

 

 

 

The graphs show that NDB scales better than Galera, which is not what we expected to find.

It is perhaps unfair to say that NDB scales better than Galera, but rather that NDB checkpoint causes less stress on I/O than InnoDB checkpoint, thus the bottleneck is on InnoDB and not Galera itself. To be more precise, the bottleneck is on slow I/O.

The follow graph shows the performance with 512 threads and 4 nodes (NDB and Galera) or 6 nodes (only NDB). Data collected every 30 seconds.

Comments

From these data, we can see that even if the workload fits completely in the buffer pool, the high number of TPS causes aggressive flushing. Also, the charts show that NDB scales better than Galera, which is not expected. Auto Parts

Davyer
Thu, 02/07/2013 - 13:24

Andy,

that won't work.

From http://dev.mysql.com/doc/refman/5.5/en/partitioning-limitations-storage-engines.html : 

Partitioning by KEY (including LINEAR KEY) is the only type of partitioning supported for the NDBCLUSTER storage engine. It is not possible in MySQL Cluster NDB 7.2 to create a MySQL Cluster table using any partitioning type other than [LINEAR] KEY, and attempting to do so fails with an error.

Rene Cannao
Sun, 12/16/2012 - 08:15

my.cnf used for Galera : http://pastebin.com/raw.php?i=p2Ax108D

 

my.cnf used for NDB : http://pastebin.com/raw.php?i=e61muuYG

 

config.ini (for 2 nodes) : http://pastebin.com/raw.php?i=u7pkAdrS

Rene Cannao
Sun, 12/16/2012 - 08:11

Rene,Great stuff! Thanks for sharing this.I am curious, could you tell us:- few more details on the configuration of InnoDB/Galera and NDB- your opinion re the net latency, how it affects the benchmarks Thanks again for sharing!-ivan

Ivan
Sat, 12/15/2012 - 12:28

Did you use (PARTITION BY RANGE) for NDB?As pointed out here (http://openlife.cc/blogs/2012/march/comments-codership-galera-vs-ndb-clo...) the default NDB partition scheme (by hashing on key) isn't a good fit for sysbench.

Andy
Thu, 12/13/2012 - 00:50

Hi ReneThis is a very good benchmark you have done. For me it was always well known that NDB is very sensitive to latency, including - apparently - CPU scheduling latency. But it is always a bit of a surprise to see how poor this is in the cloud. At the same time it is always a surprise to see how well Galera performs under poor network latency, even for clustering over continents.Now, when you say that you didn't expect NDB to scale  better, this is of course a matter of viewpoint. You use 4-6 NDB nodes to match the performance of 2 Galera nodes. But it is true that NDB scales when you add more nodes whereas Galera didn't, and there is a very natural explanation: NDB does sharding and Galera does not. When you add more NDB nodes, the write load is distributed over more shards, and you get more performance. With Galera all writes still go to all nodes, so situation stays the same (or becomes worse if you get a weaker node).For Galera scale-out, the following are true:

  1. A read-heavy workload (ie you are bottlenecked by reads) can be scaled-out by adding more nodes. This is true even for disk bound workload, if the disk access is read-heavy. Note that this is not the same as read-only scale out. Your transactions can be read-write, just that the writing is not your bottleneck.
  2. A cpu-bound workload can be scaled out. For example, Codership has done an internal benchmark with a 100% write workload and got roughly 2x more performance by adding more nodes. Coincidentally, this 2x is roughly the same boost you can get from using HandlerSocket on a write-only workload (I've heard). Note that applying RBR events is also kind of NoSQL :-)
  3. If you are bottlenecked by writing to disk, then Galera cannot scale-out, because all writes are replicated to all nodes.

The last graph is quite typical and I've seen similar behavior on disk bound workloads myself. (But your graph is nicer :-) This is the behavior you get from InnoDB when you become heavily disk-bound. However, Galera adds its own "signature" to this graph. When InnoDB becomes stuck, then Galera slave appliers are blocked - you can see this with SHOW PROCESSLIST. Committed transactions fill the Galera slave queue, and flow-control kicks in. At this point you cannot commit anything on any node in the cluster before queues are emptied again. This is by design and is the opposite of slave lag - Galera is a tightly coupled cluster, so when any one slave has an issue, everyone has to wait for it. It is my guess that the regularity in the graph you see comes from Galera flow control - InnoDB itself tends to produce the same behavior but much more irregular.Finally, it would be nice to know about you my.cnf. Did you try with larger InnoDB buffer pool? In my tests it helped a lot - but that was bare metal. Also knowing values of wsrep_slave_threads and innodb_flush_log_at_trx_commit are interesting (neither should be 1 :-)

Wed, 12/12/2012 - 17:20

Hi Rene, thanks for giving it a ride. Could you post the my.cnf you used for Galera benchmark?

Wed, 12/12/2012 - 13:52

Very Nice Post !!!Krishna

Tue, 12/11/2012 - 15:25

Reply

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.
Website by Digital Loom