Wednesday, January 17, 2018

Meltdown vs storage

tl;dr - sysbench fileio throughput for ext4 drops by more than 20% from Linux 4.8 to 4.13

I shared results from sysbench with a cached database to show a small impact from the Meltdown patch in Ubuntu 16.04. Then I repeated the test for an IO-bound configuration using a 200mb buffer pool for InnoDB and database that is ~1.5gb.

The results for read-only tests looked similar to what I saw previously so I won't share them. The results for write-heavy tests were odd as QPS for the kernel without the patch (4.8.0-36) were much better than for the kernel with the patch (4.13.0-26).

The next step was to use sysbench fileio to determine whether storage performance was OK and it was similar for 4.8 and 4.13 with read-only and write-only tests. But throughput with 4.8 was better than 4.13 for a mixed test that does reads and writes.

Configuration


I used a NUC7i5bnh server with a Samsung 960 EVO SSD that uses NVMe. The OS is Ubuntu 16.04 with the HWE kernels -- either 4.13.0-26 that has the Meltdown fix or 4.8.0-36 that does not. For the 4.13 kernel I repeat the test with PTI enabled and disabled. The test uses sysbench with one 2gb file, O_DIRECT and 4 client threads. The server has 2 cores and 4 HW threads. The filesystem is ext4.

I used these command lines for sysbench:
sysbench fileio --file-num=1 --file-test-mode=rndrw --file-extra-flags=direct \
    --max-requests=0 --num-threads=4 --max-time=60 prepare
sysbench fileio --file-num=1 --file-test-mode=rndrw --file-extra-flags=direct \
    --max-requests=0 --num-threads=4 --max-time=60 run

And I see this:
cat /sys/block/nvme0n1/queue/write_cache
write back

Results

The next step was to understand the impact of the filesystem mount options. I used ext4 for these tests and don't have much experience with it. The table has the throughput in MB/s from sysbench fileio that does reads and writes. I noticed a few things:
  1. Throughput is much worse with the nobarrier mount option. I don't know whether this is expected.
  2. There is a small difference in performance from enabling the Meltdown fix - about 3%
  3. There is a big difference in performance between the 4.8 and 4.13 kernels, whether or not PTI is enabled for the 4.13 kernel. I get about 25% more throughput with the 4.8 kernel.

4.13    4.13    4.8    mount options
pti=on  pti=off no-pti
100     104     137     nobarrier,data=ordered,discard,noauto,dioread_nolock
 93     119     128     nobarrier,data=ordered,discard,noauto
226     235     275     data=ordered,discard,noauto
233     239     299     data=ordered,discard,noauto,dioread_nolock

Is it the kernel?

I am curious about what happened between 4.8 and 4.13 to explain the 25% loss of IO throughput.

I have another set of Intel NUC servers that use Ubuntu 16.04 without the HWE kernels -- 4.4.0-109 with the Meltdown fix and 4.4.0-38 without the Meltdown fix. These servers still use XFS. I get ~2% more throughput with the 4.4.0-38 kernel than the 4.4.0-109 kernel (whether or not PTI is enabled).

The loss in sysbench fileio throughput does not reproduce for XFS. The filesystem mount options are "noatime,nodiratime,discard,noauto" and tests were run with /sys/block/nvme0n1/queue/write_cache set to write back and write through. The table below has MB/s of IO throughput.

4.13    4.13    4.8
pti=on  pti=off no-pti
225     229     232     write_cache="write back"
125     168     138     write_cache="write through"

More debugging

This is vmstat output from the sysbench test and the values for wa are over 40 for the 4.13 kernel but less than 10 for the 4.8 kernel. The ratio of cs per IO operation is similar for 4.13 and 4.8.

# vmstat from 4.13 with pti=off

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  4      0 15065620 299600 830564    0    0 64768 43940 7071 21629  1  6 42 51  0
 0  4      0 15065000 300168 830512    0    0 67728 45972 7312 22816  1  3 44 52  0
 2  2      0 15064380 300752 830564    0    0 69856 47516 7584 23657  1  5 43 51  0
 0  2      0 15063884 301288 830524    0    0 64688 43924 7003 21745  0  4 43 52  0

# vmstat from 4.8 with pti=on

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  4      0 14998364 384536 818532    0    0 142080 96484 15538 38791  1  6  9 84  0
 0  4      0 14997868 385132 818248    0    0 144096 97788 15828 39576  1  7 10 83  0
 1  4      0 14997248 385704 818488    0    0 151360 102796 16533 41417  2  9  9 81  0
 0  4      0 14997124 385704 818660    0    0 140240 95140 15301 38219  1  7 11 82  0

Friday, January 12, 2018

Meltdown vs MySQL part 2: in-memory sysbench and a core i5 NUC

This is my second performance report for the Meltdown patch using in-memory sysbench and a small server. In this test I used a core i5 NUC with the 4.13 and 4.8 kernels. In the previous test I used a core i3 NUC with the 4.4 kernel.
  • results for 4.13 are mixed -- sometimes there is more QPS with the fix enabled, sometimes there is more with the fix disabled. The typical difference is small, about 2%.
  • QPS for 4.8, which doesn't have the Meltdown fix, are usually better than with 4.13, the largest difference is ~10% and the difference tend to be larger at 1 client than at 2 or 8.

Configuration

My usage of sysbench is described here. The servers are described here. For this test I used the core i5 NUC (NUC7i5bnh) with Ubuntu 16.04. I have 3 such servers and ran tests with the fix enabled (kernel 4.13.0-26), the fix disabled via pti=off (kernel 4.13.0-26) and the old kernel (4.8.0-36) that doesn't have the fix. From cat /proc/cpuinfo I see pcid. This server uses the HWE kernels to make wireless work. I repeated tests after learning that 4.13 doesn't support the nobarrier mount option for XFS. My workaround was to switch to ext4 and the results here are from ext4.

The servers have 2 cores and 4 HW threads. I normally use them for low-concurrency benchmarks with 1 or 2 concurrent database clients. For this test I used 1, 2 and 8 concurrent clients to determine whether more concurrency and more mutex contention would cause more of a performance loss.

The sysbench test was configured to use 1 table with 4M rows and InnoDB. The InnoDB buffer pool was large enough to cache the table. The sysbench client runs on the same host as mysqld.

I just noticed that all servers had the doublewrite buffer and binlog disabled. This was leftover from debugging the XFS nobarrier change.

Results

My usage of sysbench is described here which explains the tests that I list below. Each test has QPS for 1, 2 and 8 concurrent clients. Results are provided for
  • pti enabled - kernel 4.13.0-26 with the Meltdown fix enabled
  • pti disabled - kernel 4.13.0-26 with the Meltdown fix disabled via pti=off
  • old kernel, no pti - kernel 4.8.0-36 which doesn't have the Meltdown fix
After each of the QPS sections, there are two lines for QPS ratios. The first line compares the QPS for the kernel with the Meltdown fix enabled vs disabled. The second line compares the QPS for the kernel with the Meltdown fix vs the old kernel. A value less than one means that MySQL gets less QPS with the Meltdown fix.

update-inlist
1       2       8       concurrency
5603    7546    8212    pti enabled
5618    7483    8076    pti disabled
5847    7613    8149    old kernel, no pti
-----   -----   -----
0.997   1.008   1.016   qps ratio: pti on/off
0.958   0.991   1.007   qps ratio: pti on / old kernel

update-one
1       2       8       concurrency
11764   18880   16699   pti enabled
12074   19475   17132   pti disabled
12931   19573   16559   old kernel, no pti
-----   -----   -----
0.974   0.969   0.974   qps ratio: pti on/off
0.909   0.964   1.008   qps ratio: pti on / old kernel

update-index
1       2       8       concurrency
7202    12688   16738   pti enabled
7197    12581   17466   pti disabled
7443    12926   17720   old kernel, no pti
-----   -----   -----
1.000   1.000   0.958   qps ratio: pti on/off
0.967   0.981   0.944   qps ratio: pti on / old kernel

update-nonindex
1       2       8       concurrency
11103   18062   22964   pti enabled
11414   18208   23076   pti disabled
12395   18529   22168   old kernel, no pti
-----   -----   -----
0.972   0.991   0.995   qps ratio: pti on/off
0.895   0.974   1.035   qps ratio: pti on / old kernel

delete
1       2       8       concurrency
19197   30830   43605   pti enabled
19720   31437   44935   pti disabled
21584   32109   43660   old kernel, no pti
-----   -----   -----
0.973   0.980   0.970   qps ratio: pti on/off
0.889   0.960   0.998   qps ratio: pti on / old kernel

read-write range=100
1       2       8       concurrency
11956   20047   29336   pti enabled
12475   20021   29726   pti disabled
13098   19627   30030   old kernel, no pti
-----   -----   -----
0.958   1.001   0.986   qps ratio: pti on/off
0.912   1.021   0.976   qps ratio: pti on / old kernel

read-write range=10000
1       2       8       concurrency
488     815     1080    pti enabled
480     768     1073    pti disabled
504     848     1083    old kernel, no pti
-----   -----   -----
1.016   1.061   1.006   qps ratio: pti on/off
0.968   0.961   0.997   qps ratio: pti on / old kernel

read-only range=100
1       2       8       concurrency
12089   21529   33487   pti enabled
12170   21595   33604   pti disabled
11948   22479   33876   old kernel, no pti
-----   -----   -----
0.993   0.996   0.996   qps ratio: pti on/off
1.011   0.957   0.988   qps ratio: pti on / old kernel

read-only.pre range=10000
1       2       8       concurrency
392     709     876     pti enabled
397     707     872     pti disabled
403     726     877     old kernel, no pti
-----   -----   -----
0.987   1.002   1.004   qps ratio: pti on/off
0.972   0.976   0.998   qps ratio: pti on / old kernel

read-only range=10000
1       2       8       concurrency
394     701     874     pti enabled
389     698     871     pti disabled
402     725     877     old kernel, no pti
-----   -----   -----
1.012   1.004   1.003   qps ratio: pti on/off
0.980   0.966   0.996   qps ratio: pti on / old kernel

point-query.pre
1       2       8       concurrency
18490   31914   56337   pti enabled
19107   32201   58331   pti disabled
18095   32978   55590   old kernel, no pti
-----   -----   -----
0.967   0.991   0.965   qps ratio: pti on/off
1.021   0.967   1.013   qps ratio: pti on / old kernel

point-query
1       2       8       concurrency
18212   31855   56116   pti enabled
18913   32123   58320   pti disabled
17907   32941   55430   old kernel, no pti
-----   -----   -----
0.962   0.991   0.962   qps ratio: pti on/off
1.017   0.967   1.012   qps ratio: pti on / old kernel

random-points.pre
1       2       8       concurrency
3043    5940    8131    pti enabled
2944    5681    7984    pti disabled
3030    6015    8098    old kernel, no pti
-----   -----   -----
1.033   1.045   1.018   qps ratio: pti on/off
1.004   0.987   1.004   qps ratio: pti on / old kernel

random-points
1       2       8       concurrency
3053    5930    8128    pti enabled
2949    5756    7981    pti disabled
3058    6011    8116    old kernel, no pti
-----   -----   -----
1.035   1.030   1.018   qps ratio: pti on/off
0.998   0.986   1.001   qps ratio: pti on / old kernel

hot-points
1       2       8       concurrency
3931    7522    9500    pti enabled
3894    7535    9214    pti disabled
3914    7692    9448    old kernel, no pti
-----   -----   -----
1.009   0.998   1.031   qps ratio: pti on/off
1.004   0.977   1.005   qps ratio: pti on / old kernel

insert
1       2       8       concurrency
12469   21418   25158   pti enabled
12561   21327   25094   pti disabled
13045   21768   21258   old kernel, no pti
-----   -----   -----
0.992   1.004   1.002   qps ratio: pti on/off
0.955   0.983   1.183   qps ratio: pti on / old kernel

XFS, nobarrier and the 4.13 Linux kernel

tl;dr

My day
  • nobarrier isn't supported as a mount option for XFS in kernel 4.13.0-26 with Ubuntu 16.04. I assume this isn't limited to Ubuntu. Read this for more detail on the change.
  • write throughput is much worse on my SSD without nobarrier
  • there is no error on the command line when mounting a device that uses the nobarrier option
  • there is an error message in dmesg output for this

There might be two workarounds:
  • switch from XFS to ext4
  • echo "write through" > /sys/block/$device/queue/write_cache

The Story

I have a NUC cluster at home for performance tests with 3 NUC5i3ryh and 3 NUC7i5bnh. I recently replaced the SSD devices in all of them because previous testing wore them out. I use Ubuntu 16.04 LTS and recently upgraded the kernel on some of them to get the fix for Meltdown.

The NUC7i5bnh server has a Samsung 960 EVO SSD that uses NVMe. I use the HWE kernel to make wireless work. The old kernel without the Meltdown fix is 4.8.0-36 and the kernel with the Meltdown fix is 4.13.0-26. Note that with the old kernel I used XFS with the nobarrier option. With the new kernel I assumed I was still getting nobarrier, but I was not. I have since switched from XFS to ext4.

The NUC5i3ryh server has a Samsung 850 EVO SSD that uses SATA. The old kernel without the Meltdown fix is 4.4.0-38 and the kernel with the Meltdown fix is 4.4.0-109. I continue to use XFS on these.

Results sysbench for NUC5i3ryh show not much regression from the Meltdown fix. Results for the NUC7i5bnh show a lot of regression for the write-heavy tests and not much for the read-heavy tests.
  • I started to debug the odd 7i5bnh results and noticed that write IO throughput was much lower for servers with the Meltdown fix using 4.13.0-26. 
  • Then I used sysbench fileio to run IO tests without MySQL and noticed that read IO was fine, but write IO throughput was much worse with the 4.13.0-26 kernel.
  • Then I consulted my local experts, Domas Mituzas and Jens Axboe.
  • Then I noticed the error message in dmesg output

Meltdown vs MySQL part 1: in-memory sysbench and a core i3 NUC

This is my first performance report for the Meltdown patch using in-memory sysbench and a small server.
  • the worst case overhead was ~5.5%
  • a typical overhead was ~2%
  • QPS was similar between the kernel with the Meltdown fix disabled and the old kernel
  • the overhead with too much concurrency (8 clients) wasn't worse than than the overhead without too much concurrency (1 or 2 clients)

Configuration

My usage of sysbench is described here. The servers are described here. For this test I used the core i3 NUC (NUC5i3ryh) with Ubuntu 16.04. I have 3 such servers and ran tests with the fix enabled (kernel 4.4.0-109), the fix disabled via pti=off (kernel 4.4.0-109) and the old kernel (4.4.0-38) that doesn't have the fix. From cat /proc/cpuinfo I see pcid.

The servers have 2 cores and 4 HW threads. I normally use them for low-concurrency benchmarks with 1 or 2 concurrent database clients. For this test I used 1, 2 and 8 concurrent clients to determine whether more concurrency and more mutex contention would cause more of a performance loss.

The sysbench test was configured to use 1 table with 4M rows and InnoDB. The InnoDB buffer pool was large enough to cache the table. The sysbench client runs on the same host as mysqld.

Results

My usage of sysbench is described here which explains the tests that I list below. Each test has QPS for 1, 2 and 8 concurrent clients. Results are provided for
  • pti enabled - kernel 4.4.0-109 with the Meltdown fix enabled
  • pti disabled - kernel 4.4.0-109 with the Meltdown fix disabled via pti=off
  • old kernel, no pti - kernel 4.4.0-38 which doesn't have the Meltdown fix
After each of the QPS sections, there are two lines for QPS ratios. The first line compares the QPS for the kernel with the Meltdown fix enabled vs disabled. The second line compares the QPS for the kernel with the Meltdown fix vs the old kernel. A value less than one means that MySQL gets less QPS with the Meltdown fix.

update-inlist
1       2       8       concurrency
2039    2238    2388    pti enabled
2049    2449    2369    pti disabled
2059    2199    2397    old kernel, no pti
-----   -----   -----
0.995   0.913   1.008   qps ratio: pti on/off
0.990   1.017   0.996   qps ratio: pti on / old kernel

update-one
1       2       8       concurrency
8086    11407   9498    pti enabled
8234    11683   9748    pti disabled
8215    11708   9755    old kernel, no pti
-----   -----   -----
0.982   0.976   0.974   qps ratio: pti on/off
0.984   0.974   0.973   qps ratio: pti on / old kernel

update-index
1       2       8       concurrency
2944    4528    7330    pti enabled
3022    4664    7504    pti disabled
3020    4784    7555    old kernel, no pti
-----   -----   -----
0.974   0.970   0.976   qps ratio: pti on/off
0.974   0.946   0.970   qps ratio: pti on / old kernel

update-nonindex
1       2       8       concurrency
6310    8688    12600   pti enabled
6103    8482    11900   pti disabled
6374    8723    12142   old kernel, no pti
-----   -----   -----
1.033   1.024   1.058   qps ratio: pti on/off
0.989   0.995   1.037   qps ratio: pti on / old kernel

delete
1       2       8       concurrency
12348   17087   23670   pti enabled
12568   17342   24448   pti disabled
12665   17749   24499   old kernel, no pti
-----   -----   -----
0.982   0.985   0.968   qps ratio: pti on/off
0.974   0.962   0.966   qps ratio: pti on / old kernel

read-write range=100
1       2       8       concurrency
 9999   14973   21618   pti enabled
10177   15239   22088   pti disabled
10209   15249   22153   old kernel, no pti
-----   -----   -----
0.982   0.982   0.978   qps ratio: pti on/off
0.979   0.981   0.975   qps ratio: pti on / old kernel

read-write range=10000
1       2       8       concurrency
430     762     865     pti enabled
438     777     881     pti disabled
439     777     882     old kernel, no pti
-----   -----   -----
0.981   0.980   0.981   qps ratio: pti on/off
0.979   0.980   0.980   qps ratio: pti on / old kernel

read-only range=100
1       2       8       concurrency
10472   19016   26631   pti enabled
10588   20124   27587   pti disabled
11290   20153   27796   old kernel, no pti
-----   -----   -----
0.989   0.944   0.965   qps ratio: pti on/off
0.927   0.943   0.958   qps ratio: pti on / old kernel

read-only.pre range=10000
1       2       8       concurrency
346     622     704     pti enabled
359     640     714     pti disabled
356     631     715     old kernel, no pti
-----   -----   -----
0.963   0.971   0.985   qps ratio: pti on/off
0.971   0.985   0.984   qps ratio: pti on / old kernel

read-only range=10000
1       2       8       concurrency
347     621     703     pti enabled
354     633     716     pti disabled
354     638     716     old kernel, no pti
-----   -----   -----
0.980   0.981   0.988   qps ratio: pti on/off
0.980   0.973   0.981   qps ratio: pti on / old kernel

point-query.pre
1       2       8       concurrency
16104   29540   46863   pti enabled
16716   30052   49404   pti disabled
16605   30392   49872   old kernel, no pti
-----   -----   -----
0.963   0.982   0.948   qps ratio: pti on/off
0.969   0.971   0.939   qps ratio: pti on / old kernel

point-query
1       2       8       concurrency
16240   29359   47141   pti enabled
16640   29785   49015   pti disabled
16369   30226   49530   old kernel, no pti
-----   -----   -----
0.975   0.985   0.961   qps ratio: pti on/off
0.992   0.971   0.951   qps ratio: pti on / old kernel

random-points.pre
1       2       8       concurrency
2756    5202    6211    pti enabled
2764    5216    6245    pti disabled
2679    5130    6188    old kernel, no pti
-----   -----   -----
0.997   0.997   0.994   qps ratio: pti on/off
1.028   1.014   1.003   qps ratio: pti on / old kernel

random-points
1       2       8       concurrency
2763    5177    6191    pti enabled
2768    5188    6238    pti disabled
2701    5076    6182    old kernel, no pti
-----   -----   -----
0.998   0.997   0.992   qps ratio: pti on/off
1.022   1.019   1.001   qps ratio: pti on / old kernel

hot-points
1       2       8       concurrency
3414    6533    7285    pti enabled
3466    6623    7287    pti disabled
3288    6312    6998    old kernel, no pti
-----   -----   -----
0.984   0.986   0.999   qps ratio: pti on/off
1.038   1.035   1.041   qps ratio: pti on / old kernel

insert
1       2       8       concurrency
7612    10051   11943   pti enabled
7713    10150   12322   pti disabled
7834    10243   12514   old kernel, no pti
-----   -----   -----
0.986   0.990   0.969   qps ratio: pti on/off
0.971   0.981   0.954   qps ratio: pti on / old kernel

Monday, December 18, 2017

MyRocks, InnoDB and TokuDB: a summary

This has links to all performance reports from my recent series. I wanted to publish all of this by year end because I am taking a break from running and documenting MyRocks and MongoRocks performance.

Small server

Results from Intel NUC servers I have at home.

Insert benchmark:

Sysbench:

Linkbench:


Sysbench: IO-bound and a fast server, part 2

This post has results for sysbench with an IO-bound workload and a fast server. It uses 1 table with 800M rows. The previous post had results for 8 tables with 100M rows/table. The goal is to understand the impact from more contention. Before that I published a report comparing sysbench with 8 tables vs 1 table for an in-memory workload.

All of the data is on github. With the exception of the scan section, the graphs below have the absolute QPS for 48 concurrent clients using 8 tables and 1 table.

tl;dr

  • MyRocks scans are up to 2X faster with readahead
  • update-one - all engines do worse. This is expected.
  • update-index - InnoDB 5.6 & 5.7 do worse. This doesn't repeat on the in-memory benchmark.
  • read-write - InnoDB 5.6 does better on the 1 table test. This is odd.
  • hot-points - all engines do worse. This is expected. MyRocks can do 3X better by switching from LRU to Clock once that feature is GA.
  • insert - InnoDB gets ~50% of the QPS with 1 table vs 8 tables.

scan

This scan test uses one connection to scan the 800M row table. For the previous post there were 8 connections each scanning a different table with 100M rows/table. Fortunately the results here are similar to the results from the previous test.
  • InnoDB 5.6 is faster than MyRocks without readahead.
  • MyRocks with readahead is faster than InnoDB 5.6 but slower than InnoDB 5.7/8.0


update-one

All engines lose QPS with 1 table, the result is expected given all updates go to one row on this test. The result here is similar to the in-memory benchmark.
update-index

MyRocks and InnoDB 8.0 do great. InnoDB 5.6/5.7 lose QPS with 1 table but I did not debug that.

read-write range=100

Most engines do the same for 1 vs 8 tables. InnoDB 5.6 does much better with 1 table but I did not debug that.

hot-points

All engines lose QPS with 1 table and this is similar to the in-memory benchmark. That is expected because the test fetches the same 100 rows per query and this stays in-memory. The QPS for MyRocks can be ~3X better by switching from using LRU to Clock for the block cache but that feature might not be GA today.

insert

InnoDB gets ~2X more QPS with 8 tables than with 1. That result is similar to the in-memory benchmark. I didn't debug the source of contention.

Sysbench: IO-bound and a fast server

In this post I share results for IO-bound sysbench on a fast server using MyRocks, InnoDB and TokuDB.

tl;dr
  • MyRocks is more space efficient than InnoDB. InnoDB uses ~2.2X more space than compressed MyRocks and ~1.1X more space than uncompressed MyRocks.
  • MyRocks is more write effcient than InnoDB. InnoDB writes ~7.9X more to storage per update than MyRocks on the update-index test.
  • For full index scans InnoDB 5.6 is ~2X faster than MyRocks. But with readahead enabled, uncompressed MyRocks is ~2X faster than InnoDB 5.6 and comparable to InnoDB 5.7/8.0.
  • MyRocks is >= InnoDB 5.7/8.0 for 3 of the four update-only tests. update-one is the only test on which it isn't similar or better and that test has a cached working set.
  • MyRocks is similar to InnoDB 5.6 on the insert only test.
  • MyRocks matches InnoDB 5.7/8.0 for read-write with range-size=100 (the default). It does worse with range-size=10000.
  • MyRocks is similar to InnoDB 5.6 for read-only with range-size=100 (the default). It does worse with range-size=10000. InnoDB 5.7/8.0 do better than InnoDB 5.6.
  • Results for point-query are mixed. MyRocks does worse than InnoDB 5.6 while InnoDB 5.7/8.0 do better and worse than InnoDB 5.6.
  • Results for random-points are also mixed and similar to point-query.

Configuration

My usage of sysbench is described here. The test server has 48 HW threads, fast SSD and 50gb of RAM. The database block cache (buffer pool) was 10gb for MyRocks and TokuDB and 35gb for InnoDB. MyRocks and TokuDB used buffered IO while InnoDB used O_DIRECT. Sysbench was run with 8 tables and 100M rows/table. Tests were repeated for 1, 2, 4, 8, 16, 24, 32, 40, 48 and 64 concurrent clients. At each concurrency level the read-only tests run for 180 seconds, the write-heavy tests for 300 seconds and the insert test for 180 seconds.

Tests were run for MyRocks, InnoDB from upstream MySQL, InnoDB from FB MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log. All engines used jemalloc. Mostly accurate my.cnf files are here.
  • MyRocks was compiled on October 16 with git hash 1d0132. Tests were repeated with and without compression. The configuration without compression is called MySQL.none in the rest of this post. The configuration with compression is called MySQL.zstd and used zstandard for the max level, no compression for L0/L1/L2 and lz4 for the other levels. 
  • Upstream 5.6.35, 5.7.17, 8.0.1, 8.0.2 and 8.0.3 were used with InnoDB. SSL was disabled and 8.x used the same charset/collation as previous releases.
  • InnoDB from FB MySQL 5.6.35 was compiled on June 16 with git hash 52e058.
  • TokuDB was from Percona Server 5.7.17. Tests were done without compression and then with zlib compression.
The performance schema was enabled for upstream InnoDB and TokuDB. It was disabled at compile time for MyRocks and InnoDB from FB MySQL because FB MySQL 5.6 has user & table statistics for monitoring.

Results

All of the data for the tests is on github. Graphs for each test are below. The graphs show the QPS for a test relative to the QPS for InnoDB 5.6.35 and a value > 1 means the engine gets more QPS than InnoDB 5.6.35. The graphs have data for tests with 1, 8 and 48 concurrent clients and I refer to these as low, mid and high concurrency. The tests are explained here and the results are in the order in which the tests are run except where noted below. The graphs exclude results for InnoDB from FB MySQL to improve readability.

space and write efficiency

MyRocks is more space and write efficient than InnoDB.
  • InnoDB uses 2.27X more space than compressed MyRocks and 1.12X more space than uncompressed MyRocks.
  • InnoDB writes ~7.9X more to storage per update than MyRocks on the update index test.


scan

This has data on the time to do a full scan of the PK index before and after the write-heavy tests:
  • InnoDB 5.6 is ~2X faster than MyRocks without readahead. I don't think the InnoDB PK suffers from fragmentation with sysbench. Had that been a problem then the gap would have been smaller.
  • MyRocks with readahead is almost 2X faster than InnoDB 5.6.
  • InnoDB 5.7/8.0 is faster than InnoDB 5.6
  • For MyRocks and InnoDB 5.6 the scans before and after write-heavy tests have similar performance but for InnoDB 5.7/8.0 the scan after the write-heavy tests was faster. The scan was faster after because the storage read rate was much better as show in the second graph. But the CPU overhead/row was larger for the scan after write-heavy tests. This is a mystery.



update-inlist

The workload updates 100 rows per statement via an in-list and doesn't need index maintenance. Interesting results:
  • MyRocks is similar to InnoDB at low concurrency
  • MyRocks is the best at mid and high concurrency. It benefits from read-free secondary index maintenance.
  • InnoDB 5.7 & 8.0 are better than InnoDB 5.6 at mid and high concurrency.


update-one

While the configuration is IO-bound, the test updates only one row so the working set is cached. Interesting results:
  • MyRocks suffers at high concurrency. I did not debug this.
  • InnoDB 5.7/8.0 are worse than InnoDB 5.6 at low concurrency and better at high. New non-InnoDB code overhead hurts at low but InnoDB code improvements help at high.

update-index

The workload here needs secondary index maintenance. Interesting results:
  • MyRocks is better here than on other write-heavy tests relative to InnoDB because non-unique secondary index maintenance is read-free.
  • InnoDB 5.7/8.0 do much better than InnoDB 5.6 at mid and high concurrency
  • Relative to InnoDB 5.6, the other engines do better at mid than at high concurrency. I did not debug this.

update-nonindex

The workload here doesn't require secondary index maintenance. Interesting results:
  • MyRocks is similar to InnoDB 5.7/8.0. All are better than InnoDB 5.6.

read-write range=100

Interesting results:
  • MyRocks is similar to InnoDB 5.7/8.0. Both are slightly better than InnoDB 5.6 at mid concurrency and much better at high. I didn't debug the difference at high concurrency. Possible problems for InnoDB 5.6 include: write-back stalls, mutex contention.

read-write range=10000

Interesting results:
  • MyRocks is better than InnoDB 5.6
  • InnoDB 5.7/8.0 are better than MyRocks. Long range scans are more efficient for InnoDB starting in 5.7 and this test scans 10,000 rows per statement versus 100 rows in the previous section.

read-only range=100

This test scans 100 rows/query. Interesting results:
  • MyRocks is similar or better than InnoDB 5.6 except at high concurrency. For the read-write tests above MyRocks benefits from faster writes, but this test is read-only.
  • InnoDB 5.7/8.0 are similar or better than InnoDB 5.6. Range scan improvements offset the cost of new code.

read-only.pre range=10000

This test scans 10,000 rows/query and is run before the write heavy tests. Interesting results:
  • MyRocks is similar to slightly worse than InnoDB 5.6
  • InnoDB 5.7/8.0 are better than InnoDB 5.6

read-only range=10000

This test scans 10,000 rows/query and is run after the write heavy tests. Interesting results:
  • MyRocks is similar to slightly worse than InnoDB 5.6
  • InnoDB 5.7/8.0 are better than InnoDB 5.6 at mid and high concurrency
  • The differences with InnoDB 5.6 here are smaller than in the previous test above that is run before the write-heavy tests.

point-query.pre

This test is run before the write heavy tests. Interesting results:
  • MyRocks is slightly worse than InnoDB 5.6
  • InnoDB 5.7/8.0 are slightly better than InnoDB 5.6 

point-query

This test is run after the write heavy tests. Interesting results:

  • MyRocks and InnoDB 5.7/8.0 are slightly worse than InnoDB 5.6


random-points.pre

This test is run before the write heavy tests. Interesting results:

  • MyRocks and InnoDB 5.7/8.0 do better than InnoDB 5.6 at low concurrency but their advantage decreases at mid and high concurrency.


random-points

This test is run after the write heavy tests. Interesting results:

  • MyRocks and InnoDB 5.7 are slightly worse than InnoDB 5.6
  • InnoDB 8.0 is slightly better than InnoDB 5.6



hot-points

The working set for this test is cached and the results are similar to the in-memory benchmark.


insert

Interesting results:
  • MyRocks is worse than InnoDB 5.6 at low/mid concurrency and slightly better at high.
  • InnoDB 5.7/8.0 are similar or worse than InnoDB 5.6 at low/mid concurrency and better at high concurrency.