June 2012

Quick script to get list of all jmx beans

Recently I've needed to get information from running Cassandra processes, in order to determine which parameters to monitor. jconsole can be used for this, however if you're using a host on AWS, or in a bandwidth-limited environment, you might find that it takes way too long, especially if all you want to do for the moment is get a list of all beans and their attributes. jmxterm is a good solution for this - a command line interface to your jmx port. So here's a couple of utility scripts that you can use to get all the beans and each of their attributes using jmxterm. You may need to change $jar, $host, and $port in the perl script to fit your environment, and also maybe change the domains in the bash script. Then you should be able to simply run the bash script to get a list of each bean and its attributes. I found this useful when looking for the correct syntax to set up some nagios monitoring for cassandra. For example, here is a snippet of the output:
org.apache.cassandra.internal
.
.
.
Checking bean org.apache.cassandra.internal:type=FlushWriter
  %0   - ActiveCount (int, r)
  %1   - CompletedTasks (long, r)
  %2   - CurrentlyBlockedTasks (int, r)
  %3   - PendingTasks (long, r)
  %4   - TotalBlockedTasks (int, r)
.
.
.
And I was able to figure out the syntax for a nagios check:
check_jmx!$HOSTADDRESS$!7199!-O org.apache.cassandra.internal:type=FlushWriter -A CurrentlyBlockedTask
Hopefully, these scripts might be useful to someone else trying to query cassandra (or any java process that uses jmx). get_all_bean_info.sh:
#!/bin/bash
 
DOMAINS=(
org.apache.cassandra.db
org.apache.cassandra.internal
org.apache.cassandra.net
org.apache.cassandra.request
)
 
for domain in ${DOMAINS[@]}
do
    echo "-------------------"
    echo $domain
    output=$(./query_jmx.pl $domain 2>/dev/null | tr ' ' '+' | grep '=')
    for line in $output
    do
      bean=$(echo $line | tr '+' ' ')
      echo "Checking bean $bean"
      ./query_jmx.pl $domain $bean 2>/dev/null | grep -v "#" | grep -v "Got domain"
    done
done
query_jmx.pl:
#!/usr/bin/env perl
use strict;
my $jar = "/home/ubuntu/jmxterm-1.0-alpha-4-uber.jar";
my $host = "10.176.66.219";
my $port = 7199;
 
my $domain = shift @ARGV;
my @beans = ();
my $bean;
my $size;
for my $arg (@ARGV) {
  if ($arg =~ /^\w/) {
    push (@beans, $arg);
  } else {
    last;
  }   
}     
$size = @beans;
$bean = join(' ',@beans) if ($size > 0);
open JMX, "| java -jar $jar -n";
print JMX "open $host:$port\n";
print JMX "domain $domain \n";
if (defined $bean && length $bean > 0) {
  print JMX "bean $bean \n";
  print JMX "info \n";
} else {
  print JMX "beans \n";
}
 
print JMX "close\n";
close JMX;

Mystery Solved: Replication lag in InnoDB

 

While running a backup with XtraBackup against a slave server we noticed that replication was lagging significantly. The root cause wasn't clear, but we noticed that DML statements from replication were just hanging for a long time. Replication wasn't always hanging, but it happened so frequently that a 24 hour backup caused replication to lag 11 hours.

The first hypothesis was that all the writes generated from replication (relay log, bin log, redo log, etc) were generating too high contention on IO while XtraBackup was reading the files from disk. The redo log wasn't hitting 75%, which meant that InnoDB wasn't doing aggressive flushing - some other contention was causing replication to stall.

After various tests, we found that disabling innodb_auto_lru_dump solved the issue. It wasn’t entirely clear what the relation was between the lru dump and replication lag during backup, but it was very easy to reproduce. Enabling lru dump at runtime was immediately causing replication to lag, and disabling it restored replication back to normal.

Also, when innodb_auto_lru_dump was enabled we noticed that from time to time the simple command "SHOW ENGINE INNODB STATUS" was hanging for 2-3 minutes.

To attempt to reproduce the issue outside this production environment, we tried to run various benchmarks using sysbench, with and without auto lru dump. The sbtest table (~20GB on disk) was created using the following command:

sysbench --test=oltp --mysql-table-engine=innodb --mysql-user=root --oltp-table-size=100000000 prepare

The InnoDB settings were:

innodb_buffer_pool_size = 10G

innodb_flush_log_at_trx_commit = 2

innodb_thread_concurrency = 0

innodb_flush_method=O_DIRECT

innodb_log_file_size=128M

innodb_file_per_table

 

The various benchmarks were ran using:

- read-only workload vs read-write workload;

- small buffer pool vs large buffer pool (from 2G to 30G)

- small number of threads vs large number of threads

 

None of the above benchmarks showed any significant difference with auto lru dump enabled or disabled. Perhaps these workloads were not really reproducing our environment where we were getting issues with auto lru dump. We therefore started a new series of benchmarks with only one thread doing mainly writes - this is the workload we expect in a slave used only for replication and backups.

The workload with sysbench was modified to perform more writes than read, yet the result of the benchmark didn't change a lot - enabling or disabling lru wasn't producing any significant change in performance. The problem with this benchmark was that it was generating too many writes and filling the redo log. InnoDB was then doing aggressive flushing and this was a bottleneck that was hiding any effect caused from the lru dump.

To prevent the redo from filling too quickly, we had to change the workload to read a lot of pages, change the buffer pool from 30G to 4G, and test with always restarting mysqld and with the buffer pool prewarmed with:

select sql_no_cache count(*), sum(length(c)) FROM sbtest where id between 1 and 20000000;

sysbench --num-threads=1 --test=oltp --mysql-user=root --oltp-table-size=100000000 --oltp-index-updates=10 --oltp-non-index-updates=10 --oltp-point-selects=1 --max-requests=1000 run

 

innodb_auto_lru_dump=0:    transactions: (7.26 per sec.)

innodb_auto_lru_dump=1:    transactions: (6.93 per sec.)

 

This was not a huge difference, but we finally saw some effect of the auto_lru_dump.

It became apparent that the number of transactions per second in the above benchmark was really low because the number of random reads from disk was the bottleneck. To remove this bottleneck, we removed innodb_flush_method=O_DIRECT (therefore using the default flush method), and then ran the following to load the whole table into the OS cache (not into the buffer pool).

 

dd if=sbtest/sbtest.ibd of=/dev/null bs=1M

 

To prevent the redo log from filling up, we also changed the innodb_log_file_size from 128M to 1G.

With these changes - always using a buffer pool of 4G, restarting mysqld before each test ,and prewarming the buffer pool with "select sql_no_cache count(*), sum(length(c)) FROM sbtest where id between 1 and 20000000" - we reran the same test changing the number of requests:

10K transactions:

sysbench --num-threads=1 --test=oltp --mysql-user=root --oltp-table-size=100000000 --oltp-index-updates=10 --oltp-non-index-updates=10 --oltp-point-selects=1 --max-requests=10000 run

 

innodb_auto_lru_dump=0:    transactions: (243.22 per sec.)

innodb_auto_lru_dump=1:    transactions: (230.62 per sec.)

 

50K transactions:

sysbench --num-threads=1 --test=oltp --mysql-user=root --oltp-table-size=100000000 --oltp-index-updates=10 --oltp-non-index-updates=10 --oltp-point-selects=1 --max-requests=50000 run

 

innodb_auto_lru_dump=0:    transactions: (194.31 per sec.)

innodb_auto_lru_dump=1:    transactions: (175.69 per sec.)

 

 

With innodb_auto_lru_dump=1 , performance drops by a factor of 5-10% !

 

After this, we wanted to run a completely different test with no writes, only reads.

innodb_auto_lru_dump didn't show any difference when sysbench was executed with read only workload, and we believe the reason is simply the fact that sysbench wasn't changing too many pages in the buffer pool. The easiest way to change pages in the buffer pool is to perform a full scan of a large table with a small buffer pool. We set innodb_flush_method=O_DIRECT, since otherwise the read from the OS cache was too fast and we couldn't detect any effect of innodb_auto_lru_dump. With innodb_buffer_pool_size=4G, and restarting mysqld after each test, this was the the result of a full table scan:

 

With innodb_auto_lru_dump=0 :

mysql> select sql_no_cache count(*), sum(length(c)) FROM sbtest;

+-----------+----------------+

| count(*)  | sum(length(c)) |

+-----------+----------------+

| 100000000 |      145342938 |

+-----------+----------------+

1 row in set (3 min 27.22 sec)

 

 

With innodb_auto_lru_dump=1 :

mysql> select sql_no_cache count(*), sum(length(c)) FROM sbtest;

+-----------+----------------+

| count(*)  | sum(length(c)) |

+-----------+----------------+

| 100000000 |      145342938 |

+-----------+----------------+

1 row in set (3 min 38.43 sec)

 

Again, innodb_auto_lru_dump=1 affects performance increasing the execution time by ~5% .

It is also important to note that innodb_auto_lru_dump seems to affect performance only for some specific workload scenarios. In fact, the majority of the benchmarks we ran weren't showing any performance effect caused by innodb_auto_lru_dump.

 

Optimizing your MySQL Tuning Methodology

 

Optimizing your queries

There are two general methods for creating a query plan for a query. A rule-based optimizer goes through your query, sees if you're doing X or Y, and then optimizes for method A or B depending. Rule-based optimizers grow rather large as there are many possible cases to account for. Sometime in the past couple of decades, rule-based optimizers fell out of favour and cost-based optimizers took over instead. At the moment the query is submitted, the optimizer does some simple calculations of costs of various ways of doing the query and picks the probably-best. It turns out the simplicity of the optimizer combined with the "it generally works" nature of the beast means it totally wins the popularity contest.

 

My Tuning Methodology, the Overview

With this in mind, I wanted to evaluate my own tuning methodology. At first blush, I thought my method rather coarse and unrefined. I generally look at every variable in my.cnf, do a quick search if I don't know what it is, set it to something that “seems about right,” and then go to the next one.

 

My tuning experience is almost exclusively in high-volume shops, which isn’t representative of the many different MySQL installations of different sizes in the world. In high-volume environments, there are no good rules of thumb for tunings, because if you followed any such rules, your MySQLs would crash into the dust. So this rather naïve approach is actually a good start. Because there's that word again. Rule.

 

You can tune the my.cnf by having several rules of thumb handy. A very long ruleset of "if this, then that" either documented or as a mental checklist. This works well on MySQL installations that are typical, but it isn't my methodology. Mine is iterative and experimentally-based.

 

My Tuning Methodology, the Details

So now I've got a my.cnf that I’ve touched line-by-line and picked values that seemed reasonable. Anyone who's ever administered a MySQL knows this my.cnf is going to be a disaster if left unchecked. But here's where my magic begins and why my methodology works in high-volume environments where no rules of thumb apply.

 

I place the my.cnf onto a machine that is taking representative load. That is, it receives queries much like it would if it were performing its production role. There are a lot of ways of doing this, but my favorite is to put the slave into a production pool at a small fraction (perhaps 1%) of the workload of a currently-production slave. 

 

Next, I point trending software at the slave. What do I monitor? That's easy. EVERYTHING. Every single statistic I can think of, I graph, from the OS and DBMS. Everything sar would ever tell you. Everything from "show global status". Everything in log files I can turn into numbers.

 

Then I measure the "black-box expected throughput" of the machine. That is, it's getting a fraction of the query traffic, so is it performing obviously better than the other hosts performing full work? Say this machine is supposed to be able to answer its typical queries in 250ms. Is it doing it in at most 250ms, and perhaps better? If so, I increase the load on the machine. I keep increasing the load until it starts to struggle. Say I get it up to 150% normal load, and it's now taking 260ms to answer its queries. Now I know it's "stressed."

 

At this point, I start to look at every statistic I've graphed and look for things at their limits. Is CPU maxed out? Is disk I/O at a plateau? Are read_rnd_next handler stats off-the-charts? I find it useful to have the same metrics coming from an existing reference slave, but sometimes they are obviously in a bad state (like if read_rnd_next is pushing 100,000,000/s for example, we probably need some indexes).

 

From here, I begin to get an idea of tunings or, more precisely, "experiments I can run." Maybe the tables opened is going crazy. So I tune table_open_cache, restart the server, and see if the graph corrects (secondary concern) and if my server is responding better, say 10ms faster per query (primary concern, which is most-correctly measured as "does the application seem happy with this server in its current state?"). Maybe disk I/O is crazy, but InnoDB free pages is nothing. That suggests an experiment to increase innodb_buffer_pool_size.

 

These are perhaps obvious examples, but there can be some subtler and surprising tuning choices to be made when you start viewing graphs of statistics you don't even know the meaning of. For example, suppose you didn't know what handler_read_rnd_next meant. You look at it and see it's 10,000x larger than handler_read_next which is 50x larger than handler_read_first. You start looking up what those stats mean, et voila, you've made a useful discovery.

 

At this point, I've formulated a hypothesis, which sounds a bit like this: "The graphs indicate that my system is doing suboptimal X. There is a tuning A that affects X. Thus I will change its current value from M to N, and I expect it will perform more optimally in X." I make the change, and verify the evidence on the graphs. Most importantly, the primary concern is that the application, the thing using my DBMS, should perform better. If the graphs indicate everything is better, but the application is performing worse, we must accept the failed experiment!

 

This leads us to an anecdote about InnoDB tuning.

 

Why Experimental Evidence Matters

At a previous company, I was running several 100% InnoDB MySQL servers in a particular slave query pool, and was getting a particular performance characteristic. I was looking at my.cnf and noted innodb_buffer_pool_size was rather small (2GB out of 24GB total available RAM) and that I had an extremly low cache hit rate. I formulated the obvious hypothesis: that I should increase the buffer pool to decrease my deplorable cache hit rate. Upon increasing the innodb_buffer_pool_size to 6GB (I believe in changing things in small increments), however, I discovered that although my cache hit rate much nicer, my query latency was also up (and the application was clearly suffering).

 

The first thing to do was tune it back to 2GB to ensure things returned to status quo. Then I formulated a new experiment which made no sense, but I had to try: I tuned innodb_buffer_pool_size to 1GB... Performance INCREASED over the original 2GB setting!

 

Further experimentation settled on about a 750MB innodb_buffer_pool_size being optimal for the machines in this particular query pool (that is, smaller values began leading to worse performance again).

 

This was EXTREMELY counter-intuitive, as it should be to you if you're a MySQL DBA. It took me a long time to accept the truth of this situation and move on. I formulated many other hypotheses in the process of trying to understand this phenomenon. MyISAM? Bad trending data? Matter/antimatter collision? Practical joke by my co-workers who were in the datacenter siphoning off the blue smoke from my machines?

 

To add to the mystery, I observed two additional phenomena: First, in other slave pools, increased innodb_buffer_pool_size indeed had the expected result: better cache hit rate and better performance. Second, several months later these machines were replaced with a new set of hardware that was configured similarly in RAM/disk, but were AMD processors instead of Intel. Upon loading them up with MySQL and putting them into the pool, they performed extremely badly compared to their peers. It took almost as long to discover the obvious, that increasing the innodb_buffer_pool_size decreased latency on this configuration, as it took to originally ignore this intuition. But the FACTS, that is, the graphs coming out of the trending software didn't lie. On this new configuration, we could see the performance increase as innodb_buffer_pool_size was tuned up. I believe we settled on about 18GB of total 24GB RAM as the optimal tuning.

 

Wrapping up the Experiment, Standardize on your Configuration

At some point, after staring at graphs, modifying parameters, altering tables, changing sysctl.conf settings, remounting filesystems, it starts to be more difficult to suggest tunings that will have a great effect. Keep in mind that the business has its own goals. After a while, getting another 5% performance out of your servers isn't worth the same to the business as, say, helping the developers push out the next release. So recognize that although you could spend the next several months staring at graphs and performing experiments (which are fun), at this point you should be satisfied with your increased performance and get back to the daily grind. Roll your tunings out to the pool of DBMSes, monitor the cluster to be sure that, on the whole it's performing as well as the individual test machine was (This is important! Sometimes a subset of the machines will behave in a deviant fashion differently than you expect! Be prepared for it!), and if so, move on to something else more important. Don't worry. Changes in workload or hardware configuration or application code will necessitate revisiting this procedure in the future.

 

Syndicate content
Website by Digital Loom