Wednesday, June 21, 2017

Linux perf and the CPU regression in MySQL 5.7

I used Linux perf to get more details on system performance while running the point-query test with modern sysbench. This is for an in-memory workload and part of my series on low-concurrency CPU regressions for bug 86215.

tl;dr
  • I have more explaining to do
  • The increase in instructions/query explains the decrease in queries/second from MySQL 5.6 to 5.7 to 8.0.

Configuration

I tested MySQL using upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I used the latin1 charset and latin1_swedish_ci collation. My servers are described here and are named i3 NUC and i5 NUC.

My usage of sysbench, including the tests run, and the my.cnf used for each database engine is described here for MySQL 5.6, 5.7 and 8. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. The my.cnf files are here for 5.05.1 and 5.5. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled. Sysbench is run with 4 tables and 1M rows per table.

After loading the tables via the sysbench prepare option I ran the update-index test for a few minutes and then ran point-query with 1 client. After letting point-query warm up for 60 seconds I used this script to collect data from Linux perf. Output from Linux perf is here:

Results

The charts below show the queries/second and instructions/query for each server/release with the point-query sysbench test and 1 client. For the i3 NUC the regression is from MySQL 5.7.17 to 8.0.1. For the i5 NUC it is from 5.6.35 to 5.7.17. Hopefully I will explain why the results are different between the i3 and i5 NUC servers. I am still collecting results from the CPUs I use at work and they don't match what I report here. So I have some explaining to do.

But it is clear to me that the increase in instructions/query explains the decrease in queries/second. Note that I run the sysbench client on the same host as mysqld so the instructions/query overhead includes the sysbench client and the real regression from MySQL 5.6 to 8.0 would be larger were that excluded.

Bar

Monday, June 19, 2017

Impact of perf schema on sysbench at low concurrency

As I document the low-concurrency CPU regressions in MySQL 5.7 and 8 one of the questions is whether the performance schema is part of the problem. I don't think it is as the worst-case overhead I measure is about 7% and the typical overhead is less than 5% for a single-threaded & in-memory workload. 

Eventually I will repeat this test for workloads with more concurrency because there are overheads that won't be apparent at low-concurrency. The overhead is larger for simple queries, like sysbench point-query, and larger for complex queries. The overhead is also larger for in-memory workloads.

tl;dr - it isn't the perf schema

Configuration

I tested MySQL using upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I used the latin1 charset and latin1_swedish_ci collation. My servers are described here but I only share results for the i5 NUC.

I enabled support for the performance schema at compile time. When tests were run I set performance_schema to 1 to enable it and to 0 to disable it. Nothing else was set in my.cnf for the perf schema.

My usage of sysbench, including the tests run, and the my.cnf used for each database engine is described here for MySQL 5.6, 5.7 and 8. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled. Sysbench is run with 4 tables and 1M rows per table for 1, 2 and 4 concurrent clients. The database fits in the InnoDB buffer pool.

Results

The QPS for all tests is here. The next table is QPS relative to MySQL 5.6.35 with the perf schema enabled for each engine/configuration. For each release there isn't much difference between enabling (*-ps) and disabling (*-nops) the perf schema.

legend:
* 56-ps, 56-nops - 5.6.35 with and without perf schema
* 57-ps, 57-nops - 5.7.17 with and without perf schema
* 80-ps, 80-nops - 8.0.1 with and without perf schema

56-ps   56-nops 57-ps   57-nops 80-ps   80-nops release/test
1.00    1.02    0.66    0.68    0.73    0.76    update-index
1.00    1.02    0.57    0.59    0.61    0.64    update-nonindex
1.00    1.03    0.59    0.61    0.58    0.60    read-write.range100
1.00    1.03    0.59    0.59    0.58    0.58    read-write.range10000
1.00    1.05    0.61    0.61    0.57    0.60    read-only.range10
1.00    1.03    0.60    0.60    0.60    0.60    read-only.range10000
1.00    1.05    0.60    0.59    0.56    0.60    point-query
1.00    1.02    0.69    0.73    0.67    0.68    insert

The next table is the relative QPS for MySQL version X without the perf schema relative to the QPS for version X with the perf schema. It shows the relative the gain in QPS from disabling the perf schema for a given release. The maximum gain is about 7% but in most cases it is less than 5%. So the perf schema overhead doesn't explain the CPU regression from MySQL 5.6 to 8. It isn't a surprise that the largest overhead occurs for the point-query test because that test has the least complex queries.

5.6     5.7     8.0     release/test
1.02    1.03    1.04    update-index
1.02    1.03    1.05    update-nonindex
1.03    1.02    1.03    read-write.range100
1.03    1.00    1.00    read-write.range10000
1.05    1.00    1.05    read-only.range10
1.03    1.01    1.00    read-only.range10000
1.05    0.99    1.07    point-query
1.02    1.06    1.02    insert

Charts

Finally I have charts of the QPS for all tests. I don't have commentary for each chart. For the results below I use -ps for configurations when the perf schema was enabled and -nops when it was disabled.

Friday, June 16, 2017

Sysbench for MySQL 5.0, 5.1, 5.5, 5.6, 5.7 and 8

After sharing results for in-memory sysbench with MySQL 5.6, 5.7 and 8 I was curious about older releases and here I have results for MySQL 5.0, 5.1 and 5.5 in addition to 5.6, 5.7 and 8. This is one more result in my series on low-concurrency performance regressions.

tl;dr
  • MySQL 4.1 and 5.5 weren't great releases for performance. I skipped both in production.
  • The biggest drop in QPS between releases is from 5.6 to 5.7 and that drop often exceeds the drop from 5.0 to 5.6. What happened? Bug 86215 is open for this.

Configuration

I tested MySQL using upstream 5.0.96, 5.1.72, 5.5.51, 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I used the latin1 charset and latin1_swedish_ci collation. My servers are described here but I only share results for the i5 NUC.

I was able to compile and run MySQL 4.1.22 on the same server but have yet to share the results. The results weren't good and that matches my memory of 4.1 not being a great release. MySQL 4.0 was an awesome release but I have yet to get it running on Ubuntu 16.04 with gcc 4.7 or 4.8. There are segfaults soon after startup.

My usage of sysbench, including the tests run, and the my.cnf used for each database engine is described here for MySQL 5.6, 5.7 and 8. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. The my.cnf files are here for 5.0, 5.1 and 5.5. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled. Sysbench is run with 4 tables and 1M rows per table for 1, 2 and 4 concurrent clients. The database fits in the InnoDB buffer pool.

Results

The QPS for all tests is here. Charts for some of the tests are below.

The table below lists the QPS relative to MySQL 5.0 for each test. When the value is 0.53 (see update-index for MySQL 8) then MySQL 8 gets 53% of the QPS compared to MySQL 5.0 and 5.0 is almost 2X faster. As reported in previous posts, the regression from 5.6 to 5.7 is large. Fortunately that performance loss hasn't been repeated from 5.7 to 8.

This gives me hope. At first I thought the problem was a steady loss of performance in each major release as features are added and code paths get longer. But now it looks like most of the problem arrived with MySQL 5.7. Maybe we can fix this.

5.0     5.1     5.5     5.6     5.7     8       release/test
----    ----    ----    ----    ----    ----
1.00    0.89    0.73    0.73    0.49    0.53    update-index
1.00    0.89    0.59    0.72    0.41    0.44    update-nonindex
1.00    0.92    0.99    1.04    0.62    0.60    read-write.range100
1.00    0.95    0.92    0.85    0.50    0.49    read-write.range10000
1.00    0.92    0.89    0.92    0.56    0.52    read-only.range10
1.00    0.95    0.92    0.84    0.50    0.50    read-only.range10000
1.00    0.85    0.77    0.76    0.45    0.42    point-query
1.00    0.95    0.93    1.22    0.84    0.82    insert

Charts

For update-index the biggest drop in QPS is from 5.6 to 5.7.
For update-nonindex the biggest drop in QPS is from 5.6 to 5.7. There is also a big drop from 5.1 to 5.5, but that is fixed in 5.6. Looks like 5.5 was a lousy release for performance.
For read-write.range100 the biggest drop in QPS is from 5.6 to 5.7.
For read-write.range10000 the biggest drop in QPS was from 5.6 to 5.7.
For read-only.range10 the biggest drop in QPS is from 5.6 to 5.7.
For read-only.range10000 the biggest drop in QPS is from 5.6 to 5.7.
For point-query the biggest drop in QPS is from 5.6 to 5.7.
For insert the QPS regression is small.

Monday, June 12, 2017

Linkbench, IO-bound & Intel NUC for MySQL 5.6, 5.7 & 8

Next up in my series on low-concurrency performance is Linkbench on Intel NUC servers with an IO-bound workload. The working set is larger than RAM and there many reads from and writes to storage. This is part of my work on bug 86215 and will be my last blog post on the topic for a few weeks. For more on Linkbench read the Facebook post and paper and my latest post.

tl;dr
  • For the i5 NUC at least 2/3 of the regression is from MySQL 5.6 to 5.7 and the remainder from 5.7 to 8. Most of this is explained by more CPU overhead in 5.7 and 8.
  • For the i3 NUC there is more variance, but that has a slower and older CPU that is less interesting to me. At this point I think the i3 NUC is useful to test builds and run MTR but not to test performance.

Configuration

I tested MySQL using upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I repeated the test: first with the default charset/collation and then with latain1/latin1_swedish_ci.

My servers are described here. The charts below refer to the servers as i3 NUC and i5 NUC. Compared to the i3 NUC, the i5 NUC is newer, has a CPU that is 2X faster, 2X more RAM, 2X more storage and storage that is more than 2X faster.

The my.cnf used for each database engine is described here. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled.

For this test I used maxid1=40M in the Linkbench configuration for the i3 NUC and maxid1=80M for the i5 NUC. I use Linkbench from here and helper scripts from here. My usage of Linkbench is described here. A sample command line to run the tests is:
bash all.sh rx ~/bin/mysql /data/m/data 40000001 sdb 1 3600 mysql lb.sql.inno 24 127.0.0.1 1
Results

The first pair of charts shows the absolute and relative insert rates during the load test. For the i3 NUC there is a regression from MySQL 5.6 to 5.7 but not to 8. Results for many of my tests have been less predictable on the i3 NUC perhaps because it has a much slower and older CPU. For the i5 NUC most of the regression is from MySQL 5.6 to 5.7.

The next pair of charts show the absolute value for CPU per insert followed by the inverse of the relative rates. The absolute value comes from the Mcpu/t column in my performance data. The i3 NUC results continue to be odd for MySQL 8. For the i5 NUC most of the CPU regression is from MySQL 5.6 to 5.7. For the i5 NUC the inverse of CPU per insert values are similar to the relative insert rates above and the CPU regression explains the drop in insert rates from MySQL 5.6 to 5.7 and 8.


The next pair of charts show the absolute transactions/second rates and the the rates relative to the value for MySQL 5.6. These are from the 24th hour of the transaction test. For the i3 NUC about 1/3 of the regression is from MySQL 5.6 to 5.7. For the i5 NUC about 2/3 of the regression is from MySQL 5.6 to 5.7.


The final pair of charts shows the absolute value for CPU per transaction followed by the inverse of the value relative to MySQL 5.6. These are from the 24th hour of the transaction test. The absolute value is from the Mcpu/t column in the performance data. For the i3 NUC about 1/3 of the CPU regression is from MySQL 5.6 to 5.7 and for the i5 NUC most of the regression is from MySQL 5.6 to 5.7. The second chart shows that the inverse of the relative CPU predicts the transaction rate and the CPU regression explains the transaction rate regression.


Metrics

All of the performance data is here and explained by a previous post.

Saturday, June 10, 2017

Summary of recent performance tests for MySQL 5.6, 5.7 and 8

I have been reporting on low-concurrency performance regressions in MySQL for a few years and recently published many reports to compare MySQL 5.6.35, 5.7.17 and 8.0.1 using Intel NUC servers. This is tracked by bug 86215. My summary of the recent tests is:
  • The problem is not that there is a CPU regression from MySQL 5.6 to 5.7 to 8 as that is expected. The problem is that the regression is too large. In the worst case, I have MySQL 5.6 gets up to 2X more QPS than 5.7 and 8 using sysbench. In a typical case MySQL 5.6 gets 1.2X to 1.3X more QPS than 5.7 and 8.
  • Most of the slowdown is from MySQL 5.6 to 5.7 and less of the problem is from 5.7 to 8. I think this is good news. This is based on my results from the i5 NUC.
  • My team is committed to making this better. I hope that upstream is too. One day the big Percona Live presentation from upstream will include benchmark results for MySQL at low concurrency in addition to the results we always get for extremely high concurrency.
  • If you publish results from tests for N configurations, you will always get a request for testing one more configuration. So keep N small and save energy for the followup. But my hope is that we learn something from the tests that I did, rather than ask for more tests. A more clever person would run tests for N configurations, initially share results for N-2 and then when asked for a few more configs wait a day and share the final two. Alas, I am not clever.
Background reading

My recent performance reports are here:
  • May 2017 - Sysbench, in-memory & Intel NUC
  • May 2017 - Sysbench, IO-bound & Intel NUC for MySQL 5.6, 5.7 & 8
  • June 2017 - Insert benchmark, in-memory & Intel NUC for MySQL 5.6, 5.7 and 8
  • June 2017 - Insert benchmark, IO-bound & Intel NUC for MySQL 5.6, 5.7 & 8
  • June 2017 - Linkbench, in-memory & Intel NUC for MySQL 5.6, 5.7 & 8
  • June 2017 - Linkbench, IO-bound & Intel NUC for MySQL 5.6, 5.7 & 8
  • June 2017 - Sysbench for MySQL 5.0, 5.1, 5.5, 5.6, 5.7 and 8
  • June 2017 - Impact of perf schema on sysbench at low concurrency
  • June 2017 - Linux perf and the CPU regression in MySQL 5.7

Friday, June 9, 2017

Linkbench, in-memory & Intel NUC for MySQL 5.6, 5.7 & 8

Next up in my series on low-concurrency performance is Linkbench on Intel NUC servers with an in-memory workload. The database fits in RAM and there are no reads from but many writes to storage. This is part of my work on bug 86215. One day I will stop running tests and start working on code improvements but not yet. For more on Linkbench read the Facebook post and paper and my latest post.

tl;dr
  • MySQL 5.7 and 8 get about 80% of insert rate and 60% of the transaction rate compared to MySQL 5.6 in the worst case at low concurrency. This is explained by an increase in the CPU cost per insert and per transaction. I assume more code & more features is the cause.
  • For the i3 NUC some of the regression is from MySQL 5.6 to 5.7 and some is from 5.7 to 8.
  • For the i5 NUC most of the regression is from MySQL 5.7 to 8.

Configuration

I tested MySQL using upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I repeated the test: first with the default charset/collation and then with latain1/latin1_bin for the i3 NUC and latin1/latin1_swedish_ci for the i5 NUC.

My servers are described here. The charts below refer to the servers as i3 NUC and i5 NUC. Compared to the i3 NUC, the i5 NUC is newer, has a CPU that is 2X faster, 2X more RAM and storage that is more than 2X faster.

The my.cnf used for each database engine is described here. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled.

For this test I used maxid1=2M in the Linkbench configuration, about 2M nodes will be loaded and the InnoDB database is about 3gb. I use Linkbench from here and helper scripts from here. My usage of Linkbench is described here. A sample command line to run the tests is:
bash all.sh rx ~/bin/mysql /data/m/data 2000001 sdb 1 3600 mysql lb.sql.inno 24 127.0.0.1 1
Results

The first pair of charts shows the absolute and relative insert rates during the load test. Insert rates drop from MySQL 5.6 to 5.7 and 8. For the i3 NUC about half of the regression is from MySQL 5.6 to 5.7. For the i5 NUC about 2/3 of the regression is from MySQL 5.6 to 5.7.

The next pair of charts show the absolute value for CPU per insert followed by the inverse of the relative rates. The absolute value comes from the Mcpu/t column in my performance data. More CPU is used in MySQL 5.7 and 8 compared to 5.6. For the i3 NUC about 1/3 of the CPU regression is from MySQL 5.6 to 5.7. For the i5 NUC about 2/3 of the CPU regression is from MySQL 5.6 to 5.7. The inverse of CPU per insert values are similar to the relative insert rates above and the CPU regression explains the drop in insert rates from MySQL 5.6 to 5.7 and 8.

The next pair of charts show the absolute transactions/second rates and the the rates relative to the value for MySQL 5.6. These are from the 24th hour of the transaction test. For the i3 NUC about 1/2 of the regression is from MySQL 5.6 to 5.7. For the i5 NUC most of the regression is from MySQL 5.6 to 5.7.


The final pair of charts shows the absolute value for CPU per transaction followed by the inverse of the value relative to MySQL 5.6. These are from the 24th hour of the transaction test. The absolute value is from the Mcpu/t column in the performance data. The first chart shows that for the i3 NUC about 1/2 of the CPU regression is from MySQL 5.6 to 5.7 and for the i5 NUC most of the regression is from MySQL 5.6 to 5.7. The second chart shows that the inverse of the relative CPU predicts the transaction rate and the CPU regression explains the transaction rate regression.

Metrics

All of the performance data is here and explained by a previous post.

All about Linkbench

Linkbench is a benchmark for social graph transaction processing. It lead to a paper, a post, benchmark client in Java and many blog posts from me. As a bonus, I also got to know the person who did the work for it - thanks Tim. The benchmark client has moved to my Github account because the upstream project has been archived (no comment). The repo is not active but someone recently added support for Postgres, someone else added support for multiple schemas and I fixed it to work with MySQL 8. Percona forked it to add support for MongoDB but that really needs proper support for multi-document transactions in MongoDB.

I run Linkbench via helper scripts. These include files to create tables for MyRocks and InnoDB. There are two phases Linkbench: load and transaction. My helper scripts run the load phase and then transactions. From past experience results become stable after a few hours of transactions so I run it for 24 hours in 1-hour loops and compute metrics per hour to understand whether performance and efficiency change over time. My helper script all.sh uses load.sh and run.sh to do that while also computing hardware efficiency metrics. A typical command line for a small database is below. The last argument is the number of clients to concurrent load link and count tables while the load of the node table is always done by 1 thread So when the last argument is 1 there will be 2 concurrent clients for most of the load (1 for link/count and 1 for node). Note that the load of the link and count tables can finish before the load of the node table.
bash all.sh rx ~/bin/mysql /data/m/data 2000001 sdb 1 3600 mysql lb.sql.inno 24 127.0.0.1 1
Load phase

The load.sh script creates a result summary file with the name l.r.$something and an example is here. I extract the most interesting numbers from that file into a one-line summary that I archive. An example summary line is listed below. The legend for the results is:
  • ips - average insert rate for the link table. This is less than the real insert rate during the test because it doesn't count inserts to the count and node tables. Don't forget this when looking at the results below that are divided by ips.
  • r/i, rkb/i, wkb/i - iostat r/s, iostat rKB/s and iostat wKB/s divided by ips.
  • Mcpu/i - vmstat us + sy columns divided by ips and then multiplied by a constant. Note that this value can only be compared between servers using the same CPU.
  • size - database size in GB at the end of the load.
  • rss - mysqld RSS size in GB at the end of the load.
  • wMB/s - average of iostat wMB/s during the load. I really need to update this script to compute r/s and rMB/s.
  • cpu - average of vmstat us + sy columns during the test
  • engine - notes on the storage engine and configuration

ips    r/i    rkb/i  wkb/i  Mcpu/i  size   rss    wMB/s  cpu    engine
29810  0      0      1.33   1311    2.8g   0.17   39.6   39.1   MyRocks

Transaction phase

The run.sh script runs transaction phases in a loop for a fixed amount of time per loop and creates results summary files with the name r.r.$something.L$n.P$x where $n is the loop number. When it runs with 16 clients for 24 1-hour loops then there will be files like r.r.$something.L1.P16, r.r.$something.L2.P16, etc. An example is here.

I extract interesting results from that file to create a one line summary as shown below. The legend for the results is:
  • tps - average transaction rate.
  • r/t, rkb/t, wkb/t - iostat r/s, iostat rKB/s and iotat wKB/s divided by tps.
  • Mcpu/t - vmstat us and sy columns divided by tps and then multipled by a constant. Note that this value can only be compared between servers using the same CPU.
  • size - database size in GB at test end.
  • rss - mysqld RSS size in GB at test end.
  • un, gn, ul, gl - 99th percentile response time in milliseconds for the most frequent transactions: un is UPDATE_NODE, gn is GET_NODE, ul is UPDATE_LINK, gl is GET_LINKS_LIST. The types of transactions are explained in the paper and implemented for MySQL in LinkStoreMysql.java.
  • r/s, rMB/s, wMB/s - average for iostat r/s, rMB/s and wMB/s during the test.
  • cpu - average of vmstat us + sy columns during the test
  • engine - notes on the storage engine and configuration

tps   r/t   rkb/t wkb/t Mcpu/t  size  rss   un    gn   ul   gl   r/s   rMB/s wMB/s cpu   engine
3868  0      0    1.41  8154    3.0g  4.04  0.2   0.1  0.6  0.5  10.4  0     5.4   31.5  MyRocks

Thursday, June 8, 2017

Insert benchmark, IO-bound & Intel NUC for MySQL 5.6, 5.7 & 8

Next up in my series on low-concurrency performance is the insert benchmark on Intel NUC servers with an IO-bound workload. The database and working set are much larger than RAM and there are many reads from and writes to storage. This is part of my work on bug 86215. One day I will stop running tests and start working on code improvements but not yet. Read this for more on the insert benchmark.

tl;dr
  • New MySQL releases do better on the i3 NUC but worse on the i5 NUC. This is similar to results for the in-memory test, but the differences are larger for IO-bound. One theory for why I don't see a regression for the i3 NUC is that the SSD has much higher read response times which mask CPU overheads. Per StorageReview the read latency on the i3 NUC might be 1.5X slower for a read-only workload. I will guess the difference is larger for a mixed workload.
  • I hope to repeat the tests for the i3 NUC because the speedup on the q100 test is odd. I have repeated some of the results for the in-memory insert benchmark. It is possible that the device wasn't in a steady state for all of the IO-bound tests. Perhaps I need to run tests for more time.
  • In case you missed the previous two points, my results for the i3 NUC are confusing and I hope to repeat tests to explain that. But the i5 NUC results match previous tests and show the impact from the CPU regression in new MySQL releases, even for IO-bound workloads.

Configuration

I tested MySQL using upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I repeated the test: first with the default charset/collation and then with latin1/latin1_swedish_ci.

My servers are described here. The charts below refer to the servers as i3 NUC and i5 NUC. Compared to the i3 NUC, the i5 NUC is newer, has a CPU that is 2X faster, 2X more RAM and storage that is more than 2X faster.

The my.cnf used for each database engine is described here. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled.

My usage of the insert benchmark is described here. For this test I used 1 client for the load test and then 1 writer + 1 query client for the tests that do inserts & queries concurrently. The load test loads 250M rows for the i3 NUC, 500M rows for the i5 NUC and the insert+query tests run for about 2 hours per test.

Results

The first pair of graphs shows the absolute and relative insert rates during the load test. For the i3 NUC there is no regression from MySQL 5.6 to 5.7 and 8. For the i5 NUC there is a regression and about half is from 5.6 to 5.7.

The next pair of graphs is the absolute and relative query rate for the q1000 test where another client does 1000 inserts/second. For the i3 NUC there is no regression. For the i5 NUC there is a large regression and about half is from MySQL 5.6 to 5.7.

The next pair of graphs is the absolute and relative query rate for the q100 test where another client does 100 inserts/second. For the i3 NUC there a speedup from MySQL 5.6 to recent releases. For the i5 NUC there is a small regression after MySQL 5.6.

CPU regression

I run vmstat and iostat when the tests run to determine how much hardware is used per insert and per query. For each of the tests (load, q1000, q100) the graphs show the inverse of the relative CPU overhead per insert and per query. For the i5 NUC the results here are similar to the throughput results above and more CPU overhead in recent MySQL releases is likely to explain the throughput regression. For the i3 NUC the CPU overhead appears to be reduced on recent releases. I don't understand that yet. Note that my CPU measurement includes everything on the host, not just mysqld.
Metrics

All of the performance data is here. This includes metrics from the benchmark client and from vmstat and iostat.

Friday, June 2, 2017

Insert benchmark, in-memory & Intel NUC for MySQL 5.6, 5.7 and 8

Next up in my series on low-concurrency performance is results for the insert benchmark on Intel NUC servers with an in-memory workload. The database is cached so there are no reads from but many writes to storage. This is part of my work on bug 86215. One day I will stop running tests and start working on code improvements but not yet. Read this for more on the insert benchmark.

tl;dr
  • MySQL 5.7 and 8 get about 85% of the insert throughput and 72% of the query throughput compared to MySQL 5.6.
  • For the i3 NUC about half of the regression is from MySQL 5.6 to 5.7 and the other half from 5.7 to 8. For the i5 NUC most of the regression is from MySQL 5.6 to 5.7. This is true for all of the tests (load, q1000 and q100).
  • MySQL 5.7 and 8 use more CPU per insert and per query and that explains the regression. Note that I run the benchmark client on the same host as mysqld and the client uses a lot of CPU so the real CPU regression is larger than I report here.

Configuration

I tested MySQL using upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I repeated the test: first with the default charset/collation and then with latin1/latin1_swedish_ci.

My servers are described here. The charts below refer to the servers as i3 NUC and i5 NUC. Compared to the i3 NUC, the i5 NUC is newer, has a CPU that is 2X faster, 2X more RAM and storage that is more than 2X faster.

The my.cnf used for each database engine is described here. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled.

My usage of the insert benchmark is described here. For this test I used 1 client for the load test and then 1 writer + 1 query client for the tests that do inserts & queries concurrently. The load test loads 10M rows and the insert+query tests run for about 2 hours per test.

Results

The first pair of graphs shows the absolute and relative insert rates during the load test. On the i3 NUC about half of the regression is from MySQL 5.6 to 5.7 and the other half from 5.7 to 8 On the i5 NUC about 2/3 of the regression is from MySQL 5.6 to 5.7 and 1/3 from 5.7 to 8.
The next pair of graphs is the absolute and relative query rate for the q1000 test where another client does 1000 inserts/second. For the i3 NUC about half of the regression is from MySQL 5.6 to 5.7 and the other half from 5.7 to 8. For the i5 NUC most of the regression is from MySQL 5.6 to 5.6. This is similar to the results for the load test above.

The next pair of graphs is the absolute and relative query rate for the q100 test where another client does 100 inserts/second. For the i3 NUC about half of the regression is from MySQL 5.6 to 5.7 and the other half from 5.7 to 8. For the i5 NUC most of the regression is from MySQL 5.6 to 5.6. This is similar to the results for the load and q1000 tests above.


CPU regression

I run vmstat and iostat when the tests run to determine how much hardware is used per insert and per query. Here I show that almost all of the throughput differences are due to a CPU regression from MySQL 5.6 to 5.7 and 8. For each of the tests (load, q1000, q100) the graphs show the inverse of the relative CPU overhead per insert and per query. In the charts you can see that value is similar to the relative insert and query rates above.
Metrics

All of the performance data is here. This includes metrics from the benchmark client and from vmstat and iostat.