October 2010

How I Find MySQL Fragmentation

Sarah Novotny recently posted InnoDB Tablespace Fragmentation - Find it and Fix it, so I thought I would share how I determine MySQL fragmentation.  This works for MyISAM and InnoDB tables, and will work whether or not innodb_file_per_table is in use.

The basic idea is to compare the size of the file(s) on disk with the size in the database metadata.  The DATA_LENGTH and INDEX_LENGTH fields of the TABLES table in the INFORMATION_SCHEMA database has size information calculated by the storage engine.  There are a few points to note:

  • Sometimes querying INFORMATION_SCHEMA can take a long time, and in rare cases has been reported to crash MySQL.  I have written about how you can tell how risky an INFORMATION_SCHEMA query might be.
  • MyISAM tables - In MyISAM, the metadata is exact, and the files on disk are split into data (.MYD) and index (.MYI).  Compare the size of the .MYD to DATA_LENGTH and compare the size of the .MYI file to INDEX_LENGTH.  You can calculate exactly how fragmented each table is, and decide which tables to defragment.  Sample query:

SELECT (DATA_LENGTH+INDEX_LENGTH)/1024/1024 AS sizeMb FROM INFORMATION_SCHEMA.TABLES WHERE ENGINE='MyISAM';

  • InnoDB tables when using a centralized file for data and indexes (aka "not using innodb_file_per_table") - Because all of the data and indexes are in one file, you have to compare the sum of the size of all InnoDB tables with the size of the ibdata file(s).  However, because InnoDB has estimated metadata, you cannot use a direct comparison.  What I do is get a percentage difference between the size on disk and the size reported by the metadata, and if it's more than about 20% difference than the table is likely fragmented enough to justify defragmentation.  Sample query:

SELECT SUM(DATA_LENGTH+INDEX_LENGTH)/1024/1024 AS sizeMb FROM INFORMATION_SCHEMA.TABLES WHERE ENGINE='InnoDB';

  • InnoDB tables that use separate files for metadata versus data and indexes (aka "using innodb_file_per_table") - As explained in the previous example, InnoDB uses estimated size for metadata, not exact numbers like MyISAM.  So again I figure out the percentage difference and if it's more than 20% then I will recommend defragmentation.  Sample query:

 

SELECT (DATA_LENGTH+INDEX_LENGTH)/1024/1024 AS sizeMb FROM INFORMATION_SCHEMA.TABLES WHERE ENGINE='InnoDB';

 

Note that every database is different.  I use the 20% as a guideline, not a hard-and-fast rule.  It really depends on how "wrong" Innodb's estimate of the size is.  

Fragmentation occurs when variable-length data is updated, and when variable-length rows are deleted.  You can probably guess which tables have the most fragmentation if you know what queries are being run.  If you want to get a better sense of how often updates and deletes are being run, I highly recommend using mk-query-digest against a sample set of binary logs.

Some Videos from 2010 OpenSQL Camp Boston

 

 

OpenSQLCamp Boston has only been over for a week, but I already have about 2/3 of the videos uploaded to YouTube.  I have updated the schedule page with all the videos and slides I knew about.  I welcome comments with more information (e.g. links to slides, or tag or description suggestions for the YouTube videos).

Here's the list of videos and slides so far (also linked at http://opensqlcamp.org/Events/Boston2010/Schedule):

Adventures in Alternative Energy "Data Monitoring" with MySQL -- architecture and design case study - Matt Yonkovit, Percona - video

Cassandra and Lucene - Jake Luciani, Riptano - video - slides(slideshare)

Common MySQL Performance Blunders - Matt Yonkovit, Percona - video

Databases of the Future (Discussion) - Josh Berkus, PostgreSQL Experts - video

Keeping MySQL slaves in sync using Maatkit tools (mk-table-checksum, mk-table-sync) - Sheeri Cabral, PalominoDB, with input from Matt Yonkovit, Percona - video - slides (PDF)

 

The MariaDB Server -- What do you want from it? (intro + discussion) - Colin Charles, Monty Program AB - video

MySQL replication and the quest for a global transaction ID - Giuseppe Maxia, MySQL - video

MySQL Tuner 2.0 - Sheeri Cabral, PalominoDB - video - slides (PDF)

 

MVCC Unmasked: Implementation and Issues in Postgres, Cassandra, MySQL and CouchDB - Bruce Momjian,EnterpriseDB; Jake Luciani, Riptano; Rob Wultsch, GoDaddy; Josh Berkus, PostgreSQL Experts - video

Serialization in distributed DBS - Josh Berkus, PostgreSQL Experts -video

 

 

SQL Meets NoSQL: Mapping relational semantics onto a true multi-master, eventually consistent database (SlackDB) - Eric Day,Rackspace - video - note that there were no slides, only a discussion, so really it's audio, but it's up on YouTube, so there is video.

Teaching Developers SQL (Discussion) - Led by Josh Berkus, PostgreSQL Experts - video

 

Managing mistakes, apologies and other Grown Up Business.

Here at PalominoDB, we recently had the opportunity to work through a few mistakes and their aftermath.  Why is this pertinent to a blog by a database management team?  Regardless of your use of MySQL, Oracle, Cassandra or whatever technology you've deemed appropriate, it is the management of operations that becomes the true differentiator in your service providing.  Dealing with these recent mistakes got me thinking on what a crucial component of growth these moments are.  In the world of operations, mistakes can be quite visible, and are not uncommon to any of us.  One might argue that one of the biggest parts of our jobs is managing risk and mitigating impact of mistakes that will inevitably occur.

In the moment, it is easy to get very upset with someone who makes a mistake, or two or even three in a week.  I've learned from the best clients on how to deal with mistakes, by how they deal with ours. Mistakes are typically insights into a broken process, a bad behavior pattern or a lack of information and tools.  If you do manage risk and have planned appropriately, you should be able to survive and handle the immediate effects of the mistake.  Then, you have an opportunity to identify the issues, and improve them.  Anyone who sees a mistake (and I mean a true mistake of ignorance, versus a mistake of gross negligence) as a chance only to yell, blame and generally act discourteously is not interested in growth.  

I've found in life that growth is inevitably fostered by mistakes, occasional hardship, and even pain.  With an attitude of growth, these become quite manageable, if not welcome.  Recognizing this should help you reduce your stress around your own mistakes, and focus on the growth opportunities present.

So how do you deal with that mistake?  I learned a long time ago that the ability to truly, genuinely apologize is another one of those crucial life skills, and here is the perfect time to use that skill.  An immediate acknowledgment of your mistake, recognition of the impact your mistake has made, and a sincere request to offer amends are the parts of a good apology.  Don't focus on avoiding responsibility, trying to pass the buck, share blame, etc...  You are responsible for your part in this, and that is what you are apologizing for.

Next, you dig into what happened, and why it happened.  Perform a post-mortem, talk in ways that do not make anyone feel attacked, and find out where the root issue is.  Rather than focusing on blame and punishment, focus on the core issue, and put together a concise plan of attack to address the issue once it is exposed.  With this approach, continued improvement, and an environment that fosters openness and transparency becomes the norm.

OpenSQLCamp Boston in Detail

In short:

Register / see who's coming

Schedule (will be filled in with presentations before Saturday noon)

Session ideas (45-minute sessions)

Friday, October 15th - 6-10 pm, WorkBar Boston, 711 Atlantic Ave, Boston, in the basement.  Socializing, swag, raffles, dinner, beer and soft drinks.  Take public transit (South Station on the Red Line subway or Silver Line bus if coming from the airport) or a cab; parking can be quite expensive in that area.

Saturday, October 16th - 8:30 am - 5 pm, MIT Stata Center 1st floor, 32 Vassar Street, Cambridge.  Breakfast, lunch, tech presentations.  A short walk from the Kendall Square subway stop on the Red Line, or drive an park in any MIT lot -- even if it says parking permit only, that does not apply on the weekends.   

Sunday, October 17th - same as Saturday

-----

The longer form:

As many of you know, OpenSQLCamp Boston kicks off tomorrow night with a social event at WorkBar Boston from 6-10 pm, and will include a buffet dinner from the Pulse Cafe.  Even though WorkBar Boston is more "work" than "bar" - it is a coworking space - there will be beer as well (special thanks to IOUG for sponsoring this event in particular).  Whether or not you are drinking, I strongly recommend taking public transit or a taxi -- the location is across the street from South Station, a major bus and train hub.  South Station is on the Red Line of the subway, and there is also a Silver Line bus directly from the airport terminals. The subway and Silver Line fare is $2.00.

Make sure to get sleep because Saturday starts at 8:30 am at the MIT Stata center, 32 Vassar Street.  We start with breakfast, and then after a few introductory remarks we start making the schedule at 9:30 am.  Then there are 3 45-minute sessions, with lunch at 1 pm, a panel on indexing from 2-3 pm, and from 3-5 is open time to ask questions, work on projects that were discussed during the day, and otherwise hack during the hackathon.

OpenSQLCamp does not provide dinner, but usually at the end of the day people figure out where they want to go next, and we all go over to a bar or restaurant (or go to a few different ones depending on people's preferences and tastes). 

Sunday is the same schedule as Saturday, except there is an extra session slot because we do not need opening remarks and the planning session. 

Here's the detail of food, for those who are wondering:

Friday night catered by Pulse Cafe

Vegan and vegetarian appetizers, wraps (incl. vegan), salad, vegetarian lasagna.  Beer, soda, water, iced tea.

Saturday and Sunday breakfast catered by Panera bread

Fruit, bagels, pastries, hot egg & cheese and ham, egg & cheese sandwiches, coffee, tea.

Saturday lunch catered by Greek Corner

Hummus, grape leaves, Pastitso, Falafel, Gyros, Greek Salad (feta on the side)

Sunday lunch catered by Pita Pit

Assorted pitas including meat, vegetarian and vegan options.

I am very excited, and can't wait to see you there!

Slow Query Log in MySQL 5.1 Cheat Sheet

I try to eliminate warnings and errors in the error log as much as possible - not just fixing actual issues, but also to eliminate the noise in the error log.  As I have been doing 5.1 upgrades lately, I have been seeing a lot of the following errors in the error log when upgrading:

[Warning] '--log_slow_queries' is deprecated and will be removed in a future release. Please use ''--slow_query_log'/'--slow_query_log_file'' instead.

And every time that happens I have to remember the proper syntax for slow query logging.  Finally I just wrote it in my ongoing cheat sheet, it's just a short text document that I keep on my desktop, and items rotate in and out of it frequently.

Without further ado, here is my cheat sheet for upgrading your slow query log from MySQL 5.0 to  MySQL 5.1:

 

change 

log-slow-queries=/var/log/mysqld/log-slow-queries.log 

to 

log_output=FILE # also can be FILE,TABLE or TABLE or NONE 

slow-query-log=1 

slow_query_log_file=/var/log/mysqld/log-slow-queries.log 

Note that setting log_output will also set the general_log output.  It does not change anything about error logging.

 

Building Cassandra-Cacti-M6 on Centos 5.5

 

Turns out this is hard to do.    I’m writing this here for my benefit, and everyone else’s, too.  I’ve got a client using Cassandra, and of course you monitor that stuff.  So I figure that using the cassandra-cacti-m6 stuff is a good plan.  That’s cool, it works with Cacti.  Cacti’s pretty snazzy.   My customer is using Centos 5.5 on the monitoring box.  Turns out there are a lot of hoops to jump through for that to work.  So here’s what I did.

  1. Install jpackage-utils
  2. Install the jpackage.repo into /etc/yum.repos.d, and enable the rhel5 targets
  3. Install JDK6 Update 3 (because of the following step)
  4. Install java-1.6.0-sun-compat from ftp://jpackage.hmdc.harvard.edu/JPackage/5.0/generic/RPMS.non-free/

Then I go to install ANT, but I discover something — trying to install ant.noarch complains about “Missing Dependency: /usr/bin/rebuild-security-providers is needed by package”.  This sucks.  So I do some googling, and figure out that someone solved this issue on Centos 5.x.  I do what they did (documented at http://plone.lucidsolutions.co.nz/linux/centos/jpackage-jpackage-utils-compatibility-for-centos-5.x ).  This works great.  The package is built, I install it out of the local build dir.

Then I install ant.noarch and ant-nodeps.noarch.  These do the trick.

The I build cassandra-cacti-m6 as documented in its source tree.  Woo.

It took me a while to dig up and get all this working, but for you, I hope it’s fast and easy.  Enjoy!

-Gabriel

 

Syndicate content
Website by Digital Loom