October 2011

Getting a List of Users From the MySQL General Log

From time to time, organizations want to know if there are any users that are not used.  For clients using MySQL 5.1 and up, that can handle a 1% overhead for general logging, we will enable the general log for a period of time and analyze the users that connect.

Note: we have some extremely busy clients, and we very rarely have a problem turning the general log on, other than making sure we're rotating and compressing logs so we do not run out of disk space.

Once we have the logs, I run this little perl tool I made -- I call it genlog_users.pl:

#!/usr/bin/perl

 

my $infile=$ARGV[0];

my %seen=();

my @uniq=();

 

open (INPUT, "<$infile");

while (<INPUT>) {

  my $line=$_;

  if ($line=~/Connect/) {

    if ($line=~/(\S*@\S*)/) { push(@uniq, $1) unless $seen{$1}++; }

 } # end if line matches Connect

}

close INPUT;

open (OUTPUT, ">>..users.txt");

$,="\n";

print OUTPUT (keys %seen);

print OUTPUT ("\n");

close OUTPUT;

 

----------

I hope it is useful for whoever stumbles on this; I know it has been useful for me in the past -- it's just doing some string matching, and I bet if I used Python it would be done in half the lines, but it's already fewer than 20 lines, so it's pretty small to begin with.

PalominoDB Percona Live: London Slides are up!

 

Percona Live: London was a rousing success for PalominoDB.  I was sad that I could not attend, but I got a few people who sent "hellos" to me via my coworkers.  But on to the most important stuff -- slides from our presentations are online!

René Cannao spoke about MySQL Backup and Recovery Tools and Techniques (description) slides (PDF)

 

Jonathan delivered a 3-hour tutorial about Advanced MySQL Scaling Strategies for Developers (description) slides (PDF)

Enjoy!

Automatic Downtimes in Nagios Without Using Time Periods

Monitoring systems are a great thing, and we rely heavily on Nagios here at PalominoDB.  We also rely heavily on xtrabackup for (mostly) non-locking, "warm" backups of MySQL.  In order to get a consistent backup with a proper binlog position on a slave, xtrabackup stops replication for a short period of time.  If the monitoring system catches this, the pager will go off, and usually in the middle of the night.

Nagios can have specific windows when it is on or off for a particular host or service, but you have to remember that when you change the time the backup runs.  I prefer to call a script from cron, just before I call the backup script, so that I can easily see that they are related.

Note that this script works even if you have Nagios password protected with .htaccess, because wget allows for a username and password.  This script is not perfect - there is not a lot of error checking, and log files are hard-coded.  But it does what it needs to.

The script is called like this:

./scheduleDowntimeByService service length host

Sample cron entries (note that the service name depends on what you are checking and the service names themselves):

00 5 * * 0 /home/palominodb/bin/scheduleDowntimeByService.pl MySQL+Replication+Lag 3600 hostname

 

00 5 * * 0 /home/palominodb/bin/scheduleDowntimeByService.pl MySQL+Slave+IO+Thread 3600 hostname

 

00 5 * * 0 /home/palominodb/bin/scheduleDowntimeByService.pl MySQL+Slave+SQL+Thread 3600 hostname

Here is the script itself:

 

cat scheduleDowntimeByService.pl 

#!/usr/bin/perl

use strict;

use POSIX;

 

my $service = shift @ARGV;

my $length = shift @ARGV;

my $host = shift @ARGV;

 

unless ($length ) {

        $length = 3600;

}

 

 

my $startTime = time();

my $endTime = $startTime + $length;

 

my $nagios_start = POSIX::strftime("%m-%d-%Y %H:%M:00", localtime($startTime));

my $nagios_end = POSIX::strftime("%m-%d-%Y %H:%M:00", localtime($endTime));

 

$nagios_start =~ s@:@%3A@g;

$nagios_start =~ s@-@%2D@g;

$nagios_start =~ s@ @%20@g;

$nagios_end =~ s@:@%3A@g;

$nagios_end =~ s@-@%2D@g;

$nagios_end =~ s@ @%20@g;

 

my $URL = 'https://monitoring.company.com/nagios//cgi-bin/cmd.cgi?fixed=1&start_time=' .  $nagios_start . '&end_time=' . $nagios_end . '&cmd_typ=56&cmd_mod=2&host=' . $host . '&service=' . $service . '&com_data=Backups&btnSubmit=Commit';

 

my $cmd = "/usr/bin/wget --quiet --user=user --password=PASS -O /tmp/nagios_downtime.html '$URL'"

;

open ( L, ">>/tmp/nagios_downtime.log" );

print L print $cmd . "\n";

print L `$cmd`;

close L;

Exporting the mysql.slow_log table into slow query log format

Using pt-query-digest is an excellent way to perform a SQL review. However, sometimes you don't have access to the slow_query_log_file. For example, when MySQL runs on Amazon RDS, the slow_query_log_file is unavailable (see the RDS FAQ). To get around this, export the mysql.slow_log table.

To export the data, run the following SQL command from an instance that has database access. The data is exported into a MySQL slow-log format, which pt-query-digest can analyze:

mysql -u user -p -h host.rds.amazonaws.com -D mysql -s -r -e "SELECT CONCAT( '# Time: ', DATE_FORMAT(start_time, '%y%m%d %H%i%s'), '\n', '# User@Host: ', user_host, '\n', '# Query_time: ', TIME_TO_SEC(query_time),  '  Lock_time: ', TIME_TO_SEC(lock_time), '  Rows_sent: ', rows_sent, '  Rows_examined: ', rows_examined, '\n', sql_text, ';' ) FROM mysql.slow_log" > /tmp/mysql.slow_log.log

The -s (--silent) option is used to suppress echoing of the query.

The -r (--raw) option is used to disable character escaping making \n into an actual newline, otherwise it's echoed as '\n'.

Once the export is complete, run pt-query-digest to do the analysis. A simple review command is:

pt-query-digest --limit 100% /tmp/mysql.slow_log.log > /tmp/query-digest.txt

query-digest.txt is now ready for review.

I won USD$100 and you can, too

Last week, I won USD$100*.  Gazzang is giving away $100 every Friday in October, in honor of National Cyber Security Awareness Month.  There are only 2 more weeks left -- all you have to do is tweet @Gazzang_Inc one way to improve your own cyber security.

My winning tweet:

 one way is to secure MySQL! With Gazzang or the tips in  or .

The tip can be any kind of cyber security.  I believe you have to tweet on Friday, though.....so wait until tomorrow!

* I donated my winnings to http://www.technocation.org

Percona Live:London, RightScale and Eucalyptus training discounts!

(Tune in tomorrow where I tell you how you can win USD$100, like I did last week!)

Not signed up for Percona Live:London and need some motivation to do so?  How about a £40.00 discount?  That means the Expo Hall is free, regular registration is £260.00 and the tutorials + conference price is £500.00.  Use discount code "PDBUK".

In just 3 weeks, RightScale is having a cloud conference in Santa Clara, CA Nov 8-9.  There is a Zero-to-cloud training on Tuesday, November 8th, and you can get a 25% discount on the training by using code "RSCONF25".

RightScale and Eucalyptus are teaming up to provide a training, too -- on Thursday, November 10th.  If the name Eucalyptus sounds familiar, it is because former MySQL employees Mårten Mickos and Mark Atwood work there.  There is a 25% discount off that training by using code "RSEUCT25".

Important Caveats When Using pt-online-schema-change

 

Using pt-online-schema-change is a great way to change a table without downtime or switching over to a secondary database first.  While using it on production systems, we found some interesting issues to consider.

 

1) There were foreign_keys referencing the table we were altering.  There are two options to handle this.  They are specified with "--update-foreign-keys-method rebuild_constraints|drop_old_table".  One is to rebuild the tables with foreign keys.  This does lock the tables, and will block reads and writes while it is happening.  I chose to use drop_old_table.  This method disables foreign key checks, then drops the original table and renames the new table before reenabling foreign key checks.  Normally, pt-online-schema-change just renames the tables, then drops the old table, so locking is minimal.  We did find that there was locking while dropping the original table and renaming the new table.  This is something to be cautious with, especially with large tables.  In theory, you could run an online change to drop the foreign key constraints on the child tables prior to working on the parent table.

 

2) When testing, this ran in 15 minutes against an unladen database and took over 24 minutes against production.  You could add --sleep n, where n is a number of seconds, if you need it to add less load to production.  

 

3) One limit we also saw is that pt-online-schema-change will only work on tables with at least one unique single column index, such as a single column primary key.  Often partitioned tables don't have a unique single column index.

 

 

Here is some of the output of a sample pt-online-schema-change...

 
 
## Install pt-online-schema-change and pre-requisites...
wget percona.com/get/pt-online-schema-change
chmod 755 pt-online-schema-change
yum install perl-DBI
yum install perl-DBD-MySQL
 
time ./pt-online-schema-change h=127.0.0.1,D=my_schema,t=my_table -P3306 --alter "add column signature varbinary(255) NULL DEFAULT NULL; add column signature_version varchar(4) NULL DEFAULT NULL" --child-tables fk_table1,fk_table2 --update-foreign-keys-method drop_old_table -pXXXX --bin-log
# 2011-10-14T03:20:28 ./pt-online-schema-change started
# 2011-10-14T03:20:28 USE `my_schema`
# 2011-10-14T03:20:28 Alter table my_table using temporary table __tmp_my_table
# 2011-10-14T03:20:28 Checking if table my_table can be altered
# 2011-10-14T03:20:28 SHOW TRIGGERS FROM `my_schema` LIKE 'my_table'
# 2011-10-14T03:20:28 Table my_table can be altered
# 2011-10-14T03:20:28 Chunk column id, index PRIMARY
# 2011-10-14T03:20:28 Chunked table my_table into 25685 chunks
# 2011-10-14T03:20:28 User-specified child tables: fk_table1, fk_table2
# 2011-10-14T03:20:28 Starting online schema change
# 2011-10-14T03:20:28 CREATE TABLE `my_schema`.`__tmp_my_table` LIKE `my_schema`.`my_table`
# 2011-10-14T03:20:28 ALTER TABLE `my_schema`.`__tmp_my_table` add column signature varbinary(255) NULL DEFAULT NULL
# 2011-10-14T03:20:29 ALTER TABLE `my_schema`.`__tmp_my_table`  add column signature_version varchar(4) NULL DEFAULT NULL
# 2011-10-14T03:20:30 Shared columns: id, col1, col2, col3, col4, col5
# 2011-10-14T03:20:30 Calling OSCCaptureSync::capture()
# 2011-10-14T03:20:30 CREATE TRIGGER mk_osc_del AFTER DELETE ON `my_schema`.`my_table` FOR EACH ROW DELETE IGNORE FROM `my_schema`.`__tmp_my_table` WHERE `my_schema`.`__tmp_my_table`.id = OLD.id
# 2011-10-14T03:20:30 CREATE TRIGGER mk_osc_upd AFTER UPDATE ON `my_schema`.`my_table` FOR EACH ROW REPLACE INTO `my_schema`.`__tmp_my_table` (id, col1, col2, col3, col4, col5) VALUES (NEW.id, NEW.col1, NEW.col2, NEW.col3, NEW.col4, NEW.col5)
# 2011-10-14T03:20:30 CREATE TRIGGER mk_osc_ins AFTER INSERT ON `my_schema`.`my_table` FOR EACH ROW REPLACE INTO `my_schema`.`__tmp_my_table` (id, col1, col2, col3, col4, col5) VALUES(NEW.id, NEW.col1, NEW.col2, NEW.col3, NEW.col4, NEW.col5)
# 2011-10-14T03:20:30 Calling CopyRowsInsertSelect::copy()
Copying rows:   1% 24:52 remain
Copying rows:   3% 25:01 remain
...
Copying rows:  96% 00:43 remain
Copying rows:  98% 00:15 remain
# 2011-10-14T03:44:14 Calling OSCCaptureSync::sync()
# 2011-10-14T03:44:14 Renaming foreign key constraints in child table
# 2011-10-14T03:44:14 SET foreign_key_checks=0
# 2011-10-14T03:44:14 DROP TABLE `my_schema`.`my_table`
# 2011-10-14T03:44:49 RENAME TABLE `my_schema`.`__tmp_my_table` TO `my_schema`.`my_table`
# 2011-10-14T03:44:49 Calling CopyRowsInsertSelect::cleanup()
# 2011-10-14T03:44:49 Calling OSCCaptureSync::cleanup()
# 2011-10-14T03:44:49 DROP TRIGGER IF EXISTS `my_schema`.`mk_osc_del`
# 2011-10-14T03:44:49 DROP TRIGGER IF EXISTS `my_schema`.`mk_osc_ins`
# 2011-10-14T03:44:49 DROP TRIGGER IF EXISTS `my_schema`.`mk_osc_upd`
# 2011-10-14T03:44:49 ./pt-online-schema-change ended, exit status 0
 
real    24m20.777s
user    0m3.936s
sys     0m1.216s

The Magic of kSar for one-time system graphs

I forget from whom I first learned about kSar, but I am in debt to that person once again.  I first learned about it about a year ago, and it has been extremely useful whenever I am trying to debug system where I do not have access to trending graphs.  kSar is an open source graphing tool for Linux, Mac OS X and Solaris sar output.  Graphs can be generated and exported to many different formats (JPG, PNG, and even a PDF with all the graphs in it).

By simply copying a few files from /var/log/sa (at least that's the default on Linux, and of course it does depend on sar being installed and running, but 95% of the time it is), running kSar and choosing the data file, you get beautiful graphs.  Try it right now, it will take you less than 10 minutes. (Note, I have a Mac, and I know the client graphical program works on Linux/Unix/Solaris, but I have no clue if it works on Windows -- there is a run.sh script, but it just calls a jar file, so I am pretty sure it will work on Windows.)

I copy the /var/log/sa/sar* files to my laptop, then cat them together (usually cd /path/to/sa, cat sar* > alldays), then run kSar and click "Data -> Load from text file".  It is just that simple.

Call for Papers for Collaborate ends tomorrow!

The call for papers for Collaborate ends tomorrow, Friday October 14th.  Collaborate is Sunday, April 22nd through Thursday April 26th at the Mandalay Bay Convention Center in Las Vegas.  This year the folks scheduling the MySQL Conference and Expo have smartly chosen not to conflict, and in fact they are 2 weeks apart.

Collaborate is the largest independent (not put on by Oracle) conference for users of Oracle software.  This year we are having a one-day MySQL track at Collaborate, so that speakers and attendees can focus and optimize their time spent at the conference.

This is a great chance to give a talk to operations folks about running MySQL.  The audience is mixed, with some beginners and some intermediate folks.  We expect over 100 attendees to attend the MySQL sessions, as have done in the past 2 years.  

 

The track will have 1 or 2 MySQL talks per session time, so that the attendance is bigger and there is less that each attendee can miss -- last year the feedback from attendees was "I want to clone myself!" and the feedback fom speakers was "I wish there were more people in the room".  This year we are rectifying that.

 

So, don't forget to submit your paper today!

Indexing text columns with GIST or GIN to optimize LIKE, ILIKE using pg_trgm in Postgres 9.1 (Part 1)

 
“The pg_trgm module provides functions and operators for determining the similarity of ASCII alphanumeric text based on trigram matching, as well as index operator classes that support fast searching for similar strings.”
 
This is the introduction to the official documentation of the extension at [1]. Note, that I used the EXTENSION terminology instead of CONTRIB, as in PostgreSQL 9.1.  Now we’ll use the CREATE EXTENSION command to include this module in the database. This new methodology allows us to manage modules installations/uninstallations with only a few commands.

The idea of this post is to show you how KNN GIST and pg_trgm could be used together to obtain interesting results. First, let’s start with some basic elements of pg_trgm.

Installation

Installing the module is easy. If you are installing through the use of source code, you must compile the module and once you get access to the database execute:

CREATE EXTENSION pg_trgm;

That’s it! Installation complete!

What is a Trigram and how use them?


A trigram is a group of three consecutive characters in a string that can be used to detect the similarity of two words (for example) or the ‘distance’ between them.

When we talk about ‘distance’, 0 means in the same place and 1 is very far. When we talk about similarity, 1 is equal and 0 is totally different. In other words, distance is 1 minus the similarity value. These concepts about distance and similarity, are necessary to start without confusion.

What’s is a trigram? A trigram is a group of three consecutive characters from a string, used to know the similarity between two strings by counting the trigrams they share.

A trigram looks like this:

palominodb=# select show_trgm('PalominoDB CO');
                              show_trgm
-----------------------------------------------------------------------
{"  c","  p"," co"," pa",alo,"co ","db ",ino,lom,min,nod,odb,omi,pal}
(1 row)

In the pg_trgm extension we have functions and operators. show_limit and set_limit are functions used to set up and show the similarity threshold for the % operator. This operator takes the form “string1 % string2” and returns a boolean type (“t” if the similarity is greater than the similarity threshold, otherwise it returns “f”).

palominodb=# select show_limit();
show_limit
       0.4

palominodb=# select set_limit(0.3), show_limit();
set_limit | show_limit
         0.3 |    0.3

In the following example we’ll see the use of each one. Operator % will return true if the similarity of the strings is greater than similarity threshold returned by show_limit function. In this example, both string are equal, in consequence, the operation will return true:

palominodb=# select similarity('Palomino','Palomino')  AS Similarity,
                                  'Palomino'<->'Palomino'              AS distance,
                                  'Palomino' % 'Palomino'              AS SimilarOrNot;
-[ RECORD 1 ]+--
similarity          | 1
distance     | 0
similarornot     | t

Index Support and usage


Now let’s discuss combining GIST or GIN and pg_trgm. pg_trgm includes an operator class to support searches using similarity, like, or ilike operators. GIN and GIST have several differences. If you don’t know which to choose, just remember a few rules: GIN searches quicker than GIST but is slower to update; if you have a write-intensive table use GIST. GIN is better for static data. Please be aware, however, that they don’t support exact matching with the equals operator!  You can do an exact match using like/ilike with no wildcards.  If you want to use the equals operator(=), you must create a standard BTREE index on the pertinent column.

In the following examples, we’ll show a table only with a GIST index. As you can see, if you want to match the exact value with equal operator, it will scan the whole table:

palominodb=# EXPLAIN ANALYZE  SELECT id, texto FROM texto_busqueda WHERE texto = 'Palomino';
                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on texto_busqueda  (cost=0.00..90.15 rows=1 width=136) (actual time=16.835..16.846 rows=1 loops=1)
  Filter: (texto = 'Palomino'::text)
Total runtime: 17.094 ms
(3 rows)

But, if we use LIKE operator, index scan will be activated:

palominodb=# EXPLAIN ANALYZE  SELECT id, texto FROM texto_busqueda WHERE texto like 'Palomino';
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Scan using texto_busqueda_texto_idx on texto_busqueda  (cost=0.00..8.27 rows=1 width=136) (actual time=0.374..1.780 rows=1 loops=1)
  Index Cond: (texto ~~ 'Palomino'::text)
Total runtime: 1.979 ms
(3 rows)


palominodb=# EXPLAIN ANALYZE  SELECT id, texto FROM texto_busqueda WHERE texto like '%Palomino%';
                                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Index Scan using texto_busqueda_texto_idx on texto_busqueda  (cost=0.00..8.27 rows=1 width=136) (actual time=0.171..1.732 rows=1 loops=1)
  Index Cond: (texto ~~ '%Palomino%'::text)
Total runtime: 1.882 ms
(3 rows)



To use an index for  match equal strings, we need to create a BTREE index. But in case of BTREE there is a limitation of 8191 bytes per index row. So, if you have very large text columns you will not allowed to create a BTREE index without using functional indexes.

We get this result because, unlike BTREE indexes, the search string is not left-anchored.

The creation of indexes with the pg_trgm operator class is simple:

CREATE INDEX ON texto_busqueda USING GIST(texto gist_trgm_ops);
or
CREATE INDEX ON texto_busqueda USING GIN(texto gin_trgm_ops);

If you want a more comprehensive understanding of GIST or GIN implementation on Postgres, you can download the the source code and read  src/backend/access/gist/README and src/backend/access/gin/README.


Another useful technique

Combining % operator to get the strings that have a similarity greater than the established threshold and similarity function -that returns the similarity-, we can get ordered from the most similar to the less one discarding all the strings that aren’t similar enough:

palominodb=# SELECT ctid, similarity(texto, 'Palominodb') AS simil
palominodb-#  FROM texto_busqueda
palominodb-#  WHERE texto % 'Palominodb'
palominodb-#  ORDER BY simil DESC;
 ctid  |  simil
--------+----------
(55,3) | 0.666667
(1 row)

The same query, but with its EXPLAIN plan shows that the generated index is used in the condition:

palominodb=# EXPLAIN ANALYZE  SELECT ctid, similarity(texto, 'Palominodb') AS sml
palominodb-#  FROM texto_busqueda
palominodb-#  WHERE texto % 'Palominodb'
palominodb-#  ORDER BY sml DESC;
                                                             QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Sort  (cost=14.26..14.27 rows=3 width=138) (actual time=3.428..3.437 rows=1 loops=1)
  Sort Key: (similarity(texto, 'Palominodb'::text))
  Sort Method: quicksort  Memory: 17kB
  ->  Bitmap Heap Scan on texto_busqueda  (cost=4.28..14.24 rows=3 width=138) (actual time=3.336..3.383 rows=1 loops=1)
        Recheck Cond: (texto % 'Palominodb'::text)
        ->  Bitmap Index Scan on texto_busqueda_texto_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=3.278..3.278 rows=1 loops=1)
              Index Cond: (texto % 'Palominodb'::text)
Total runtime: 3.578 ms
(8 rows)


Well, this is the first part. Hope you enjoyed the reading and I’ll wait your comments and feedback!

 


[1] http://www.postgresql.org/docs/9.1/static/pgtrgm.html

Syndicate content
Website by Digital Loom