Saturday, January 21, 2017

Fun with Bugs #47 - On Some Bugs Found Using oprofile

Users had to identify the reasons for MySQL performance problems somehow well before famous Performance Schema appeared in MySQL 5.5.3, and even before Domas invented his Poor Man's Profiler. Poor users had to use some OS-level tools, and among these tools the most important one was, probably, oprofile.

oprofile is a system-wide statistical profiling tool for Linux available since 2001 and 2.4.x kernels. It was applied to MySQL for many years and allowed to find and explain many performance problems. We even have the --collect-oprofile option for pt-stalk tool, and it's a common knowledge that Percona Toolkit includes tools and options that are proved to be useful for real life problems more than once... So, we can safely assume that oprofile is (or, at least, was) an essential tool for MySQL performance problem solving.

For this post I decided to check and pick up some of MySQL bugs identified or verified with a little help from oprofile. Here is the list of most important ones (from my point of view), starting from the oldest:
  • Bug #29921 - "MYSQLparse uses 3X more CPU in MySQL 5.0 than yyparse in MySQL 4.0". It was reported by Mark Callaghan and eventually fixed (to large extent) in versions 5.0.50 and 5.1.23. While gprof was used by developers for further analysis, the reason for performance repression was demonstrated and proved with oprofile.
  • Bug #31551 - "MyISAM poor performance with multiple queries running concurrently". Reported by Vadim Tkachenko back in 2007, this bug (that is essentially about key cache mutex contention problem, that was demonstrated using oprofile) is still "Verified". Nobody seems to care to fix it, and as a workaround one may have to use multiple key caches in concurrent environment.
  • Bug #33948 - "performance issue when using sysbench benchmark on a multiple-core system". It was reported by Terry Tao, who found it using oprofile:
    "Used oprofile to trace it then found it spended more time on searching a free table entry from open_cache. "
    and suggested a patch, but the real solution based on the patch was made in frames of work on several worklogs in 5.4 (and 5.5.x) related to LOCK_open and Metadata Locks.
  • Bug #43161 - "buf_calc_page_new_checksum and ut_2_power_up use a lot of CPU on busy IO loads". It was reported by Mark Callaghan and later claimed a duplicate of his Bug #54814. Sveta Smirnova verified it using oprofile. The fix was contributed by Laurynas Biveinis for 5.6 and appeared only in MySQL 5.7.4.
  • Bug #44723 - "Larger read_buffer_size values can cause performance decrease for INSERTs", was reported by Harrison Fisk, who used oprofile on Linux to demonstrate the problem. The bug is fixed since MySQL 5.0.86, 5.1.39, and 5.4.4.
  • Bug #49047 - "InnoDB deadlock detection is CPU intensive with many locks on a single row". It was reported by Harrison Fisk and was partially fixed based on the patch he suggested in the InnoDB Plugin for MySQL 5.1 (available after 5.1.50 or so), and in MySQL 5.5. But note that innodb_deadlock_detect server variable was added recently (in MySQL 5.7.15!) to disable deadlock detection entirely. oprofile (along with DTrace) was used to identify the bug.
  • Bug #49169 - "read_view_open_now is inefficient with many concurrent sessions", was reported by Mark Callaghan. oprofile helped to find out that a lot of time is spent in this function. The bug is closed as "It was fixed as part of the kernel mutex split in 5.6.", but check last comments for some discussion of what was and was not fixed.
  • Bug #49177 - "MyISAM share list scalability problems". This bug was reported by Sergei Golubchik. Alexey Stroganov later posted a detailed analysis based on oprofile outputs, among other things. The bug was completely fixed only recently, in MySQL 5.7.9+ and MySQL 8.0.
  • Bug #53825 - "Removing locks from queue is very CPU intensive with many locks", was reported by Harrison Fisk. It is still "Verified" and thus, probably is not fixed. Note that it may become serious bottleneck in 5.7.15+ with deadlock detection disabled. I plan to double check this assumption with profilers soon.
  • Bug #56332 - "Performance regression in DROP TABLE performance post 5.0.77", was reported by Justin Swanhart. The problem is definitely fixed in recent 5.5.x and 5.6.x versions (but not in the older ones probably, there were more regressions later in the progress). This bug report contains some useful examples of opreport commands.
  • Bug #58037 - "LOCK_open contention from the table cache". This bug was reported by Mark Callaghan. The problem was well known in the past. Alexey Stroganov proved (with oprofile and scalability testing) that the problem was resolved with introduction of metadata locks in 5.5.x.
  • Bug #61401 - "UPDATE performance degrades gradually if a trigger exists", was reported by Aurimas Mikalauskas. He used oprofile to prove that initial implementation of MDL led to clear performance regression comparing to 5.1.x.
  • Bug #64304 - "assemble to implement procedure of transforming between big-end and little-end", was reported by Xiaobin Lin, who suggested a patch and proved with oprofile that it helps. The bug is still "Verified". Probably I have to talk to Marko Mäkelä (who is now my colleague in MariaDB!) about it soon...
  • Bug #68818 - "Large Value List in WHERE Clause Spends Too Much Time in 'statistics' State", was reported by Jervin Real. He used oprofile to prove his point and show where exactly in the code (as opposed to just "statistics" stage) the time was spent. Eventually this was declared a duplicate of Bug #68046 that is fixed since 5.6.11.
  • Bug #72115 - "Very slow create/alter table with partitions and STATS_PERSISTENT=1", was reported by my colleague Jan Lindstrom in 2014. It seems that even with other tools already around, including SHOW PROFILE, Performance Schema and perf profiler, oprofile is still sometimes used to show where the time is spent. After several problems and regressions identified, it seems the problem is mostly resolved in 5.6.17+. I plan to double check this some day.
We clearly see that famous MySQL developers, users, DBAs and support engineers had used oprofile successfully while working on performance related problems. In one of the upcoming posts (while preparing for my FOSDEM 2017 talk) I plan to explain how to use it, both older versions and 1.x.y ones, that rely on the same kernel features as perf profiler now and changed approaches/commands available a lot.

No comments:

Post a Comment