MySQLのスロークエリログから一部を切り出して分析するパッチ

MySQLのパフォーマンスチューニングに欠かせないスロークエリログですが、このログは特別にlogrotateなどの設定をしない限り一つのファイルに延々と追記される形になっています。先日仕事で3年分ものスロークエリログを見ることがあって大変疲れたので、次回から楽をするためにスロークエリログから一部を切り出して分析するmysqldumpslowのパッチを書いてみました。
オリジナル版のmysqldumpslowはスロークエリログ全体を読み込みます。

$ mysqldumpslow -s t sandbox03-slow.log

Reading mysql slow query log from sandbox03-slow.log

Count: 2  Time=9223372036854.76s (-1s)  Lock=0.00s (0s)  Rows=0.0 (0), tpcc[tpcc]@localhost
  INSERT INTO stock (s_i_id, s_w_id, s_quantity, s_dist_01, s_dist_02, s_dist_03, s_dist_04, s_dist_05, 
s_dist_06, s_dist_07, s_dist_08, s_dist_09, s_dist_10, s_ytd, s_order_cnt, s_remote_cnt, s_data) 
VALUES (N, N, N, 'S', 'S', 'S', 'S', 'S', 'S', 'S', 'S', 'S', 'S', N, N, N, 'S'),
(N, N, N, 'S', 'S', 'S', 'S', 'S', 'S', 'S', 'S', 'S', 'S', N, N, N, 'S'),
(N, N, N, 'S', 'S', 'S', 'S', 'S', 'S', 'S', 'S', 'S', 'S', N, N, N, 'S'),

…

Count: 20  Time=12.38s (247s)  Lock=0.00s (0s)  Rows=1.0 (20), tpcc[tpcc]@vostro
  SELECT /* S-N */ COUNT(DISTINCT s.s_i_id) FROM district d 
INNER JOIN order_line ol ON ol.ol_w_id = d.d_w_id AND ol.ol_d_id = d.d_id 
AND ol.ol_o_id BETWEEN d.d_next_o_id - N AND d.d_next_o_id - N 
INNER JOIN stock s ON s.s_w_id = ol.ol_w_id AND s.s_i_id = ol.ol_i_id 
WHERE d.d_w_id = N AND d.d_id = N AND s.s_quantity < N

Count: 718  Time=0.29s (209s)  Lock=0.00s (0s)  Rows=0.0 (0), tpcc[tpcc]@vostro
  INSERT /* N-N */ INTO order_line (ol_o_id, ol_d_id, ol_w_id, ol_number, ol_i_id, ol_supply_w_id, 
ol_delivery_d, ol_quantity, ol_amount, ol_dist_info) VALUES (N, N, N, N, N, N, NULL, N, N.N, 'S')

…

パッチ版は-b、-eオプションで開始、終了位置を指定することができます。

$ mysqldumpslow --help
Usage: mysqldumpslow [ OPTS... ] [ LOGS... ]

Parse and summarize the MySQL slow query log. Options are

  --verbose    verbose
  --debug      debug
  --help       write this text to standard output

  -v           verbose
  -d           debug
  -s ORDER     what to sort by (al, at, ar, c, l, r, t), 'at' is default
                al: average lock time
                ar: average rows sent
                at: average query time
                 c: count
                 l: lock time
                 r: rows sent
                 t: query time
  -r           reverse the sort order (largest last instead of first)
  -t NUM       just show the top n queries
  -a           don't abstract all numbers to N and strings to 'S'
  -n NUM       abstract numbers with at least n digits within names
  -g PATTERN   grep: only consider stmts that include this string
  -h HOSTNAME  hostname of db server for *-slow.log filename (can be wildcard),
               default is '*', i.e. match all
  -i NAME      name of server instance (if using mysql.server startup script)
  -l           don't subtract lock time from total time
  -b DATETIME  read slow query log after or equal to 'yymmdd hh:mm:ss'
  -e DATETIME  read slow query log before 'yymmdd hh:mm:ss'
$ mysqldumpslow -s t -b '100403 14:42:37' -e '100403 14:46:00' sandbox03-slow.log

Reading mysql slow query log from sandbox03-slow.log

Begin: 100403 14:42:37
End  : 100403 14:46:00
Count: 20  Time=12.38s (247s)  Lock=0.00s (0s)  Rows=1.0 (20), tpcc[tpcc]@vostro
  SELECT /* S-N */ COUNT(DISTINCT s.s_i_id) FROM district d 
INNER JOIN order_line ol ON ol.ol_w_id = d.d_w_id AND ol.ol_d_id = d.d_id 
AND ol.ol_o_id BETWEEN d.d_next_o_id - N AND d.d_next_o_id - N 
INNER JOIN stock s ON s.s_w_id = ol.ol_w_id AND s.s_i_id = ol.ol_i_id 
WHERE d.d_w_id = N AND d.d_id = N AND s.s_quantity < N

Count: 718  Time=0.29s (209s)  Lock=0.00s (0s)  Rows=0.0 (0), tpcc[tpcc]@vostro
  INSERT /* N-N */ INTO order_line (ol_o_id, ol_d_id, ol_w_id, ol_number, ol_i_id, ol_supply_w_id, 
ol_delivery_d, ol_quantity, ol_amount, ol_dist_info) VALUES (N, N, N, N, N, N, NULL, N, N.N, 'S')

Count: 1127  Time=0.11s (123s)  Lock=0.00s (0s)  Rows=1.0 (1127), tpcc[tpcc]@vostro
  SELECT /* N-N */ s_quantity, s_dist_01, s_dist_02, s_dist_03, s_dist_04, s_dist_05, 
s_dist_06, s_dist_07, s_dist_08, s_dist_09, s_dist_10, s_data FROM stock 
WHERE s_i_id = N AND s_w_id = N FOR UPDATE

…

以下はMySQL 5.1.45のmysqldumpslowに対するパッチファイルです。mysqldumpslowは単独のPerlスクリプトなので、改造するのはそれほど難しくありません。

*** mysqldumpslow_org	2010-04-03 14:12:41.000000000 +0900
--- mysqldumpslow	2010-04-03 14:11:48.000000000 +0900
***************
*** 14,19 ****
--- 14,21 ----
  my %opt = (
      s => 'at',
      h => '*',
+     b => '000000 00:00:00',
+     e => '999999 23:59:59',
  );
  
  GetOptions(\%opt,
***************
*** 29,38 ****
--- 31,45 ----
      'h=s',	# hostname of db server for *-slow.log filename (can be wildcard)
      'i=s',	# name of server instance (if using mysql.server startup script)
      'l!',	# don't subtract lock time from total time
+     'b=s',	# read slow query log after or equal to 'yymmdd hh:mm:ss'
+     'e=s',	# read slow query log before 'yymmdd hh:mm:ss'
  ) or usage("bad option");
  
  $opt{'help'} and usage();
  
+ die "Invalid datetime format for -b: $opt{b}\n" unless $opt{b} =~ /^\d{6} \d{2}:\d{2}:\d{2}$/;
+ die "Invalid datetime format for -e: $opt{e}\n" unless $opt{e} =~ /^\d{6} \d{2}:\d{2}:\d{2}$/;
+ 
  unless (@ARGV) {
      my $defaults   = `my_print_defaults mysqld`;
      my $basedir = ($defaults =~ m/--basedir=(.*)/)[0]
***************
*** 69,74 ****
--- 76,82 ----
  
  my @pending;
  my %stmt;
+ my $skip;
  $/ = ";\n#";		# read entire statements using paragraph mode
  while ( defined($_ = shift @pending) or defined($_ = <>) ) {
      warn "[[$_]]\n" if $opt{d};	# show raw paragraph being read
***************
*** 80,86 ****
  	next;
      }
  
!     s/^#? Time: \d{6}\s+\d+:\d+:\d+.*\n//;
      my ($user,$host) = s/^#? User\@Host:\s+(\S+)\s+\@\s+(\S+).*\n// ? ($1,$2) : ('','');
  
      s/^# Query_time: ([0-9.]+)\s+Lock_time: ([0-9.]+)\s+Rows_sent: ([0-9.]+).*\n//;
--- 88,97 ----
  	next;
      }
  
!     my ($time) = s/^#? Time: (\d{6}\s+\d+:\d+:\d+).*\n// ? ($1) : ('');
!     $skip = ($time lt $opt{b}) || ($time ge $opt{e}) if $time;
!     next if $skip;
!     
      my ($user,$host) = s/^#? User\@Host:\s+(\S+)\s+\@\s+(\S+).*\n// ? ($1,$2) : ('','');
  
      s/^# Query_time: ([0-9.]+)\s+Lock_time: ([0-9.]+)\s+Rows_sent: ([0-9.]+).*\n//;
***************
*** 127,132 ****
--- 138,146 ----
      warn "{{$_}}\n\n" if $opt{d};	# show processed statement string
  }
  
+ print "Begin: $opt{b}\n" if $opt{b} ne '000000 00:00:00';
+ print "End  : $opt{e}\n" if $opt{e} ne '999999 23:59:59';
+ 
  foreach (keys %stmt) {
      my $v = $stmt{$_} || die;
      my ($c, $t, $l, $r) = @{ $v }{qw(c t l r)};
***************
*** 180,185 ****
--- 194,201 ----
                 default is '*', i.e. match all
    -i NAME      name of server instance (if using mysql.server startup script)
    -l           don't subtract lock time from total time
+   -b DATETIME  read slow query log after or equal to 'yymmdd hh:mm:ss'
+   -e DATETIME  read slow query log before 'yymmdd hh:mm:ss'
  
  HERE
      if ($str) {

せっかくなのでbugs.mysql.comに機能追加リクエストを挙げておきました。機会があったら、ぜひ使ってみてください。