2008-03-04T17:54:01Z Copyright 2008 WordPress Administrator <![CDATA[MySQL Partitioning]]> http://palominodb.com/blog/2008/03/04/mysql-partitioning/ 2008-03-04T17:54:01Z 2008-03-04T17:54:01Z Uncategorized I’ve been setting up partitioning for various customers lately. The goals primarily have been easy purging of large growth tables and keeping indexes small enough to stay in memory and manageable. These have all been range partitions on dates, which is a rather common requirement. As you’ve probably noticed in previous posts, I absolutely hate environments where people let their tables grow like blackberry bushes. While doing research, I found the following links to be very helpful:

http://dev.mysql.com/tech-resources/articles/testing-partitions-large-db.html

http://datacharmer.blogspot.com/2006/03/mysql-51-improving-archive-performance.html

http://blog.plasticfish.info/categories/tech/mysql/mysql-partitioning/

http://mysqlguy.net/2008/02/20/using-events-manage-table-partitioning-date-wrap

As far as partition management goes, I like the idea of DB events, primarily because you don’t have to worry about crontabs continually slamming a database that’s down, and the events are portable with the database making failovers, migrations and copies that much easier. So, what events do we need to manage partitions? There are three cases that I have been using. Remember, these are for date based partitions (logs, events etc…)
1) Create the next day (or week or month etc…) ’s partition.

2) Purge any partions older than n days.

3) Check that the necessary partition for the current time period exists, and if not, create it on the fly. After all, what happens if the DB is down when the event is supposed to fire? We need to remember the edge cases.

So, in the interest of sharing, here are some events I created. I consider these rudimentary at best. Remember that it isn’t just about functionality. These need to be robust. I’m still planning on adding more error handling and email notifications on failure but I wanted to share. (I’m a giver)
DELIMITER $

CREATE EVENT log_add_partition
ON SCHEDULE
EVERY 1 DAY STARTS ‘2008-02-19 23:59:00′
DO
BEGIN
SET @stmt := CONCAT(
‘ALTER TABLE log ADD PARTITION (’
, ‘PARTITION p’
, DATE_FORMAT( DATE_ADD( CURDATE(), INTERVAL 1 DAY ), ‘%y%m%d’ )
, ‘ VALUES LESS THAN (’
, TO_DAYS( CURDATE() ) + 1
, ‘))’
);
PREPARE stmt FROM @stmt;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END
$

DELIMITER ;

DELIMITER $

CREATE EVENT log_purge_partition
ON SCHEDULE
EVERY 1 DAY STARTS ‘2008-02-20 00:00:01′
DO
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE part_name VARCHAR(25);
DECLARE cur1 CURSOR FOR
SELECT partition_name
FROM information_schema.partitions
WHERE table_name = ‘log’
AND str_to_date(substr(partition_name from 2), ‘%y%m%d’) < date_sub(now(), INTERVAL 31 day);
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

OPEN cur1;

REPEAT
FETCH cur1 INTO part_name;
IF NOT done THEN
SET @stmt := CONCAT(
‘ALTER TABLE log DROP PARTITION ‘
, part_name
);
PREPARE stmt FROM @stmt;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END IF;
UNTIL done END REPEAT;

CLOSE cur1;
END
$

DELIMITER ;
DELIMITER $

CREATE EVENT log_check_partition
ON SCHEDULE
EVERY 15 MINUTE STARTS ‘2008-02-20 00:00:01′
DO
BEGIN
DECLARE no_rows INT DEFAULT 0;
DECLARE part_name VARCHAR(25);
DECLARE cur1 CURSOR FOR
SELECT partition_name
FROM information_schema.partitions
WHERE table_name = ‘log’
AND str_to_date(substr(partition_name from 2), ‘%y%m%d’) = date_format(now(), ‘%y%m%d’) ;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET no_rows = 1;

OPEN cur1;

FETCH cur1 INTO part_name;

IF no_rows=1 THEN

SET @stmt := CONCAT(
‘ALTER TABLE log ADD PARTITION (’
, ‘PARTITION p’
, DATE_FORMAT ( CURDATE(), ‘%y%m%d’ )
, ‘ VALUES LESS THAN (’
, TO_DAYS( CURDATE() ) + 1
, ‘))’
);
PREPARE stmt FROM @stmt;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

END IF;

CLOSE cur1;
END
$

DELIMITER ;

]]>
Administrator <![CDATA[MySQL Checksum]]> http://palominodb.com/blog/2007/11/12/mysql-checksum/ 2007-11-12T17:12:07Z 2007-11-12T17:12:07Z MySQL Tools This is another tool in the same toolkit as archiver. I just saw a great blog post on it at http://blog.arabx.com.au/?p=883. Documentation can be found at http://mysqltoolkit.sourceforge.net/doc/mysql-checksum-filter.html. This is an invaluable tool for ensuring your replicated tables are staying in sync, something that MySQL replication does not do. Tables will drift and if you are dealing with critical data for read support, reports or backups, this will prove invaluable.

]]>
Administrator <![CDATA[Version Consistency]]> http://palominodb.com/blog/2007/11/11/version-consistency/ 2007-11-11T22:03:45Z 2007-11-11T22:03:45Z MySQL Oracle Process Everyone puts lip service to the concept of keeping versions consistent between servers but it is consistently one of the most broken best practices I see amongst my clients. The problems with such inconsistency are legion, and I’ll point out a few here.

Mismatch between production and development: Development environments are often neglected, particularly when it comes to OS and DBMS patches. When this happens I consistently find myself in a catch-22 when it is time to do patching in production. With nowhere to test a patch, you have to either go in blind and pray (never the most optimal choice) or neglect the patch, potentially retaining security vulnerabilities or bugs or limiting the access to potential performance or feature improvements. This is especially concerning in complex environments like an Oracle RAC cluster, where patches can be tricky at best. Of course, when working with any features outside of pure DML, you can expect occasional glitches as you introduce change, particularly when working with triggers, procedures, replication, partitioning, or any other complex feature.

Mismatch between MySQL masters and slaves: In MySQL, a lot of pain is taken to allow a slave’s version to be higher than the master’s. This can be a godsend in an upgrade scenario but I wouldn’t rely on this without significant testing. There are numerous features that may function incorrectly in such a scenario, particularly if you are using technologies that are relatively new and thus might be changing significantly between release versions, such as triggers in a 5.0.xx release. My recommendations are simple. If you’re running in day to day operations, you have to maintain consistency in your database versions. If you want to use replication as a way to do a rolling upgrade across a replicated cluster, test it thoroughly in development and make sure you test any procedures, views, triggers and functions against every possible storage engine and partitioned vs. non-partitioned tables. If your application does DDL, then test that as well. I can’t stress this one enough. I’ve had clients who were plagued with replication problems that “magically” went away once we brought their RDBMS and OS versions into sync.

Consistency between functional clusters: When I say functional cluster, I’m talking about a group of databases, usually replicated or physically clustered that support the same application component. In high activity environments, this kind of functional partitioning can be crucial in a scaling strategy. However, lack of discipline can cause these clusters to be built at or migrated to different versions over time. While this may not lead to isaster if your team is disciplined, at it’s best you will find your resources taxed. You’ll need to maintain multiple server builds, you’ll need to do a lot more qa at the data access tier and will require more steps in case you need to trade hardware between functional clusters. If you don’t have rock solid documentation and processes, new servers will be built at different versions, potentially even introduced into production with the wrong version. There will be redo work and troubleshooting that plagues you to varying levels.

To recap, when it comes to the same database in various environments (dev, test, stage etc…) there is no reason to allow for version drift. You must incorporate proper upgrade and deployment processes that, combined with discipline, will maintain that consistency. A proper configuration database can help with this as well. When it comes to a functional cluster, there absolutely must be consistency as well. There is a place for version mismatch in a rolling upgrade strategy, but only here, and only with extensive testing where possible. And finally, if there are valid reasons to have different versions across different functional clusters, be prepared for rigorous discipline and overhead in your administrative processes.

]]>
Administrator <![CDATA[MySQL Toolkit - Archiver]]> http://palominodb.com/blog/2007/10/23/mysql-toolkit-archiver/ 2007-10-23T03:48:19Z 2007-10-23T03:48:19Z MySQL Tools The MySQL Toolkit can be found at http://mysqltoolkit.sourceforge.net/. It is coded and maintained by Baron Schwartz (www.xaprb.com). I’ve been using the archiver tool he wrote lately, and wanted to share this tool. In every web environment I’ve worked in, there is data that is collected for analysis and that grows quite rapidly. User activity logs in particular can quickly grow out of control, and generally have no place in a front-end database after a certain amount of time. I’ll go into approaches to deciding what and how to archive in another post, but often this tool can prove very useful in moving data between databases or simply removing it.

I actually use the archiver in two different client sites. In one situation I have a significant influx of logs representing user activity. As with most data, only a limited window of time is required for analysis in the front-end. So we use archiver to move the older data to a separate datawarehouse. In another environment, we’ve wrapped it into a larger program for redistributing data between shards in a large cluster.
It is an excellent piece of code. I have not run into a bug as of yet, and it is highly configurable in regards to commit frequencies, transaction sizes, retries etc… Additionally, performance is a consideration, scanning the indexes forward only to create efficiency while working with large datasets. As the documentation states:

“The strategy is to find the first row(s), then scan some index forward-only to find more rows efficiently. Each subsequent query should not scan the entire table; it should seek into the index, then scan until it finds more archivable rows. .”

I have been an Oracle DBA for longer than I’d care to think about, and the world of open source is relatively new to me. I am consistently impressed with the open-source community that creates amazingly useful tools such as this. It is a refreshing change from the world of Oracle where everything is in the thousands or more of dollars. If you can support this kind of work through contributions of time or money, I’d strongly suggest you do so.

]]>
Administrator <![CDATA[The Prototype]]> http://palominodb.com/blog/2007/10/15/the-prototype/ 2007-10-15T02:19:59Z 2007-10-15T02:19:59Z Business Process The first start-up stage I’ve worked within is the prototype phase. Within this phase traffic is not an issue for performance or scale, it’s about functionality. Low traffic and small datasets can hide atrocious code quite easily. The nice thing about this stage is that you should not have to invest a lot of time or money into your database and instead can focus on functionality and business development. Over-engineering at this point can be a devastating waste of very precious resources.

Here are the three criteria I would recommend focusing on:

1. Standards - I’m not talking about long lists of rules or apocryphal abbreviations here; rather focus on simplicity, consistency and usability. Don’t make decisions without asking yourself if the choice you’ve made is simple to practice, can be done so consistently, and would be easy for a new employee to use. Without standards, you will find that your database quickly becomes difficult to navigate and utilize - and a headache to maintain. Important standards include database object names, file system layouts, documentation design and locations.
2. Backups - There is nothing more frustrating than losing days or weeks of work. One of the key jobs of a DBA is making sure that your data can be recovered in case of a crisis. From the beginning, you need a solid backup and recovery process. Again, at this stage, nothing elaborate is required. It may be as simple as a nightly dump, or you may need more frequent dumps and perhaps even point in time recovery depending on the amount of change and number of people making the changes. Another key factor here is actually documenting how to perform the restores and practicing them regularly. Backups do fail and sometimes that failure is subtle. Regular practice will help to insure a successful recovery.
3. Documentation - As you make decisions about standards, set up processes, start building scripts, and implement tools and management utilities you need to document them for repeatability. Wikis are great for this, but again you must maintain a culture of discipline in regards to documentation. Tasks should not be completed without documentation and those responsible for doing so must be held accountable for it. This is an excellent way to start building discipline into the culture. As the prototype becomes operational more rigorous processes can build on this, such as change and problem management.
]]>
Administrator <![CDATA[The Start-up and the Database - Pt 1]]> http://palominodb.com/blog/2007/10/06/the-start-up-and-the-database-pt-1/ 2007-10-06T17:10:21Z 2007-10-06T17:10:21Z Business My first senior production database role was at an established start-up, Preview Travel, that had just been purchased by a similarly established, but better positioned start-up, Travelocity. Since then, I’ve worked with start-ups in all stages of growth and I’ve seen definite patterns in how database infrastructures are designed, implemented and maintained (or the lack thereof). I’ve seen these phases of growth presented elsewhere, often at a level of granularity that didn’t work for me. I am a believer of simplicity wherever possible. So, these are the levels of growth as I see them, recognizing that the only generalization you can make is that every situation will be unique:

  1. Prototype
  2. Initial Production
  3. Pain Point 1: Availability
  4. Pain Point 2: Performance
  5. Pain Point 3: Scalability
  6. Maintenance Mode

Each of these phases must be taken with a viewpoint of the entire business. While all of us OCD architects and administrators would absoutely love to design the perfect solution from scratch, it is not realistic for a company on a shoestring budget or that has no clue about the traffic patterns they’ll be driving. Bottlenecks and pain points need to be looked at as indicators of needs for a) growth or b) improved processes. Taken in this manner, one can create a roadmap for growth based on the individual site’s needs rather than busting the budget immediately or optimizing for scenarios that will never occur.

Of course, there is an exception, and that is pain points that compromise a site’s revenue or reputation. Luckily, in 80 - 90% of the scenarios (potentially more!) these are problems that have already been solved to some degree. In the immortal words of Tyler Durden - You are not a beautiful or unique snowflake. That’s where my job comes in, to help utilize those existing lessons learned before investing in new solutions from scratch. Each phase has them, and that’s where I will be focusing my initial posts.

]]>
Administrator <![CDATA[Preamble]]> http://palominodb.com/blog/2007/10/06/preamble/ 2007-10-06T16:31:00Z 2007-10-06T16:31:00Z Business Palomino I’ve been considering starting a blog of my own for quite some time, but I must admit to some hesitancy, primarily due to the quality of technical content already posted online. Finally, I’ve decided to take the plunge, and to focus on the quality I truly bring to my own customers. Yes, I possess a very solid technical acumen around Oracle and MySQL environments. But, it isn’t technical knowledge alone that can really bring a company to that desired nirvana of availability, performance and scalability, not to mention doing so on a reasonable budget.

So, I’ve decided to focus on the strategy of database design and growth as the theme of my blogging. There will also be some tactical posts, particularly in regards to tools, scenarios and solutions that I’ve come across. My goal with this? To create something that my customers and other web-businesses can use to encourage their creative juices and to be more proactive with their own database infrastructures.

]]>