Saturday, October 21, 2017

Wearing out an SSD

I use Intel NUC servers at home to test open-source databases for performance and efficiency. The servers have an SSD for the database and that is either a Samsung 960 EVO or a Samsung 850 EVO. It is time to replace the 960 EVO after about 5 months of heavy usage. I test MySQL (MyRocks, InnoDB, MyISAM) and MongoDB (MongoRocks, WiredTiger and mmapv1). If I limited myself to MyRocks and MongoRocks then the storage devices would last much longer courtesy of better write efficiency of an LSM versus a B-Tree.

I have 3 servers with the 960 EVO and I will replace the SSD in all of them at the same time. I assume that device performance changes as it ages, but I have never tried to quantify that. For the 850 EVO I will buy extra spares and will upgrade from the 120gb device to a 250gb device because they cost the same and the 120gb device is hard to find. I just hope the devices I use right now will last long enough to finish my current round of testing. One day I will switch to EC2 and GCE and wear out their devices, but I like the predictability I get from my home servers.

I use Ubuntu 16.04 and its version of smartctl doesn't yet support NVMe devices so I used the nvme utility. Percona has a useful blog post on this. The percentage_used value is 250% which means the estimated device endurance has been greatly exceeded. The value of critical_warning is 0x4 which means NVM subsystem reliability has been degraded due to significant media related errors or any internal error that degrades NVM subsystem reliability per the NVMe spec. The data_units_written value is the number of 512 bytes units written and is reported in thousands. The value 1,400,550,163 means that 652TB has been written to the device. The device is 250GB which is about 2700 full device writes. If I wave my hands and expect 2000 full device writes from 3D NAND and ignore overprovisioning (OP) then it seems reasonable that the device is done. I assume that OP is 10% based on available_spare_threshold. The warranty on the 250gb 960 EVO is 3 years or 100 TBW and I wrote 652TB so I am happy about that.

This is from the 960 EVO.

$ sudo nvme smart-log /dev/nvme0
Smart Log for NVME device:nvme0 namespace-id:ffffffff
critical_warning                    : 0x4
temperature                         : 32 C
available_spare                     : 100%
available_spare_threshold           : 10%
percentage_used                     : 250%
data_units_read                     : 159,094,604
data_units_written                  : 1,400,550,163
host_read_commands                  : 4,698,541,096
host_write_commands                 : 19,986,018,997
controller_busy_time                : 32,775
power_cycles                        : 30
power_on_hours                      : 3,039
unsafe_shutdowns                    : 7
media_errors                        : 0
num_err_log_entries                 : 0
Warning Temperature Time            : 0
Critical Composite Temperature Time : 0
Temperature Sensor 1                : 32 C
Temperature Sensor 2                : 45 C
Temperature Sensor 3                : 0 C
Temperature Sensor 4                : 0 C
Temperature Sensor 5                : 0 C
Temperature Sensor 6                : 0 C
Temperature Sensor 7                : 0 C
Temperature Sensor 8                : 0 C

This is from the 850 EVO and the server with the largest value for Total_LBAs_Written. The device has a 512 byte sector and with Total_LBAs_Written at 739353756925 then 344TB has been written to the 120gb device. That is about 2900 full device writes assuming no OP. Once again, I should be happy that the device lasted this long. The warranty on the 120gb 850 EVO is 5 years or 75TBW. I wrote a lot more than 75TB. The Wear_Leveling_Count value is 3335 and that is the average number of P/E cycles. That value is similar to my estimate of 2900 full device writes. I assume that I will get about 2000 from 3D NAND and I exceeded that.

$ sudo smartctl --all /dev/sdb
...

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       4430
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       11
177 Wear_Leveling_Count     0x0013   001   001   000    Pre-fail  Always       -       3335
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   047   038   000    Old_age   Always       -       53
195 Hardware_ECC_Recovered  0x001a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
235 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       2
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       739353756925

SMART Error Log Version: 1
No Errors Logged

Tuesday, September 19, 2017

Linkbench: IO-bound, large server

This has results for IO-bound Linkbench on a large server. I previously shared results for in-memory & low-concurrency Linkbench.

tl;dr
  • Compression doesn't hurt MyRocks performance
  • MyRocks matches or beats InnoDB performance with much better efficiency

Configuration

I used my Linkbench repo and helper scripts to run sysbench with maxid1=1B, loaders=4 and requestors=16 so there will be 4+1 concurrent connections doing the load (4 for link/count tables, 1 for node table) and 16 connections running transactions after the load finishes. My linkbench repo has a recent commit that changes the Linkbench workload and results with that change are labeled new config while results without the change are labeled old config. I share both. The test pattern is 1) load and 2) transactions. The transactions were run in 12 1-hour loops and I share results from the last hour. The test server has 48 HW threads, fast SSD and 50gb of RAM available to the OS page cache and database.

Tests were run for MyRocks, InnoDB from upstream MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log. All engines used jemalloc. Mostly accurate my.cnf files are here.
  • MyRocks was compiled on August 15 with git hash 0d76ae. The block cache was set to 10gb. Tests were done without compression and then with no compression for L0/L1/L2, LZ4 for L3 to the next to last level and then Zstandard for the max level.
  • InnoDB was from upstream 5.6.35, 5.7.17 and 8.0.2.  The buffer pool was set to 35gb. The performance schema was enabled. Compression was not used.
  • TokuDB was from Percona Server 5.7.17. The db cache was set to 10gb. Tests were done without compression and then zlib.
Load Results

All of the data is here and note that I did not adjust metrics computed from iostat to account for iostat overstating bytes written by up to 2X for RocksDB. The table below has a subset of the results for the new config.
  • MyRocks has the best insert rate and compression doesn't hurt the rate
  • There is a regression for InnoDB from 5.7 to 8.0.2
  • Write efficiency (wKB/i) is best for TokuDB and similar between MyRocks and InnoDB
  • CPU efficiency (Mcpu/i) is similar for MyRocks and InnoDB
  • Space efficiency (size) is best for MyRocks and TokuDB. Uncompressed MyRocks uses much less space than uncompressed InnoDB

ips     wKB/i   Mcpu/i  size    rss     rMB/s   wMB/s   cpu     engine
142812  2.03     79      948     7.5    0.3     290.3   11.3    myrocks.none
144232  2.55     87      439     5.0    1.9     367.7   12.6    myrocks.zstd
132297  1.04     75     15xx    42.7    0       136.6    9.9    inno5635
138218  1.04     81     15xx    39.4    0.1     144.0   11.2    inno5717
122495  1.05     96     15xx    39.5    0.1     128.8   11.8    inno802
 36737  1.68    246     12xx    11.1    0.1      61.6    9.0    toku5717.none
 37032  1.18    257      442    11.6    0.1      43.8    9.5    toku5717.zlib

legend:
* ips - inserts/second
* wKB/i - iostat KB written per insert
* Mcpu/i - normalized CPU time per insert
* wMB/s, rMB/s - iostat write MB/s & read MB/s
* size - database size in GB at test end
* cpu - average value of vmstat us + sy columns
* rss - mysqld RSS in GB

Transaction Results

These are results from the 12th 1-hour loop of the transaction phase. All of the data is here and note that I did not adjust metrics computed from iostat to account for iostat overstating bytes written by up to 2X for RocksDB. The table below has a subset of the results for the new config. I will explain them. 
  • MyRocks and InnoDB have the best transaction rates
  • Write efficiency (wKB/t) is much better for MyRocks than for InnoDB
  • CPU efficiency (Mcpu/t) is similar for MyRocks and InnoDB
  • Space efficiency (size) is best for MyRocks and TokuDB (see here)
  • MyRocks has the the best response times and compression doesn't hurt it

tps     r/t     rKB/t   wKB/t   Mcpu/t  ul      gl      rMB/s   wMB/s   cpu     engine
34701   1.16     22.41   1.27    659      1      0.9     777.7   44.2   22.9    myrocks.none
34622   1.08     13.12   1.27    758      1      0.9     454.4   44.1   26.2    myrocks.zstd
25775   1.16     18.50  12.05    672      6      4       476.8  310.5   17.3    inno5635
34539   1.15     18.39  11.91    609      3      2       635.0  411.4   21.0    inno5717
33578   1.14     18.23  11.97    641      3      2       612.1  401.9   21.5    inno802
 8982   3.07    188.97   5.64   2685     13     12      1697.3   50.7   24.1    toku5717.none
12448   1.20     17.42   2.84   2644      6      3       216.9   35.4   32.9    toku5717.zlib

legend:
* tps - transactions/second
* r/t - iostat reads per transaction
* rKB/t, wKB/t - iostat KB read & written per transaction
* Mcpu/t - normalized CPU time per transaction
* size - database size in GB at test end
* un, gn, ul, gl - 99th percentile response time in millisecs for UpdateNode,
                   GetNode, UpdateList and GetLinkedList transactions
* rMB/s, wMB/s - iostat read MB/s & write MB/s
* cpu - average CPU utilization from vmstat us & sy

Charts

Charts for load and transaction throughput using the ips and tps columns from the tables above.

Wednesday, September 13, 2017

Making it easier to read Linux perf output

This is a short note to myself the next time I need to remember things I had to do to enhance the readability of Linux perf output.
  • trim trailing spaces -> perf report --stdio | awk 'sub(/ *$/, "")'
  • don't forget the children when using perf record -g -> perf report --no-children


Friday, September 8, 2017

Linkbench - in-memory, low-concurrency

After a few weeks of sysbench it is now time for more complex workloads and the first one is Linkbench with a cached database and low-concurrency. I prefer to start with cached & low-concurrency configurations before trying IO-bound & high-concurrency.

tl;dr:
  • InnoDB from MySQL 5.6 had the best throughput
  • CPU efficiency is similar for MyRocks and InnoDB
  • There is a CPU regression from MySQL 5.6 to 5.7 to 8.x
  • Write efficiency was similar for all engines on the load test but much better for MyRocks and TokuDB on the transaction test.

Configuration

I used my Linkbench repo and helper scripts to run sysbench with maxid1=10M, loaders=1 and requestors=1 so there will be 2 concurrent connections doing the load and 1 connection running transactions after the load finishes. My linkbench repo has a recent commit that changes the Linkbench workload and results with that change are labeled new config while results without the change are labeled old config. I share both. The test pattern is 1) load and 2) transactions. The transactions were run in 12 1-hour loops and I share results from the last hour. The test server has 48 HW threads, fast SSD and 256gb of RAM.

Tests were run for MyRocks, InnoDB from upstream MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log. All engines used jemalloc. Mostly accurate my.cnf files are here but the database cache was made large enough to cache the ~10gb database.
  • MyRocks was compiled on August 15 with git hash 0d76ae. Compression was not used.
  • InnoDB was from upstream 5.6.35, 5.7.17 and 8.0.2.  The performance schema was enabled.
  • TokuDB was from Percona Server 5.7.17. Compression was not used.

Load Results

All of the data is here and note that I did not adjust metrics computed from iostat to account for iostat overstating bytes written by up to 2X for RocksDB. The table below has a subset of the results for the new config. The results for the old config are similar.
  • InnoDB 5.6 has the best insert rate but there is a regression from InnoDB in 5.6 to 5.7 to 8.0.2
  • Write efficiency (wkb/i) is similar for all engines
  • CPU efficiency (Mcpu/i) is similar for MyRocks and InnoDB

ips     wkb/i   Mcpu/i  size    wMB/s   cpu     engine
 54283  1.60     83     14       86.7    4.5    myrocks
 64402  1.02     72     16       65.6    4.6    inno5635
 56414  1.03     77     16       58.2    4.3    inno5717
 42954  1.02     97     16       45.1    4.2    inno802
 21611  1.42    179     14       30.7    3.9    toku5717

legend:
* ips - inserts/second
* wkb/i - iostat KB written per insert
* Mcpu/i - normalized CPU time per insert
* wMB/s - iostat write MB/s, average
* size - database size in GB at test end
* cpu - average value of vmstat us + sy columns


Transaction Results

These are results from the 12th 1-hour loop of the transaction phase. All of the data is here and note that I did not adjust metrics computed from iostat to account for iostat overstating bytes written by up to 2X for RocksDB. The table below has a subset of the results for the new config. I will explain them. The results for the old config are similar.
  • InnoDB 5.6 has the best transaction rate but there is a regression from 5.6 to 5.7 to 8.0.2
  • Write efficiency (wkb/t) is much better for TokuDB and MyRocks than for InnoDB
  • CPU efficiency (Mcpu/t) is similar for MyRocks and InnoDB
  • Response times are similar between MyRocks and InnoDB

tps     wkb/t   Mcpu/t  size  un    gn    ul    gl    wMB/s  engine
5489    0.78     642    15    0.3   0.1   0.5   0.5    4.3   myrocks
7239    5.15     524    26    0.3   0.1   0.4   0.2   37.3   inno5635
6463    5.17     580    26    0.3   0.1   0.5   0.3   33.4   inno5717
5855    5.25     623    25    0.3   0.1   0.6   0.3   30.7   inno802
3333    0.08    1043    18    0.6   0.2   1.0   0.8   10.0   toku5717

legend:
* tps - transactions/second
* wkb/t - iostat KB written per transaction
* Mcpu/t - normalized CPU time per transaction
* size - database size in GB at test end
* un, gn, ul, gl - 99th percentile response time in millisecs for UpdateNode,
                   GetNode, UpdateList and GetLinkedList transactions
* wMB/s - iostat write MB/s, average


Charts

Charts for load and transaction throughput using the ips and tps columns from the tables above.


Thursday, September 7, 2017

IO-bound table scan performance for MyRocks, InnoDB and TokuDB

I used sysbench to compare IO-bound table scan performance for MyRocks, InnoDB and TokuDB. Tests were run on a large server with fast SSD, Intel NUCs with SSD and an Intel NUC with disk. I call this IO-bound because for all tests the table was larger than RAM.

tl;dr
  • MyRocks can be more than 2X slower than for InnoDB. 
  • InnoDB in 5.7 does better than in 5.6
  • TokuDB without compression is comparable to InnoDB without compression and does much better than InnoDB when prefetching is enabled.
  • Compression usually has a small impact on scan performance for MyRocks with zstd and a much larger impact for TokuDB with zlib. I wonder how much of this is a measure of zstd vs zlib.
  • Scans were usually slower for all engines after fragmentation but the impact was larger for MyRocks and TokuDB than for InnoDB.

Configuration

I used my sysbench helper scripts with my sysbench branch. For tests with X tables there was 1 connection per table doing a full scan and when X > 1 the scans were concurrent. The scan time was measured twice -- first immediately after the load and index step and then after many updates have been applied. The second measurement was done to show the impact of fragmentation on scan performance.

I repeated tests on different hardware:
  • 48core.ssd - server has 48 HW threads, fast SSD and 50gb of RAM. Tests were done with 8 tables and 100M rows/table and then 1 table with 800M rows.
  • i3.ssd - a core i3 Intel NUC with Samsung 850 SSD, 8gb of RAM and 4 HW threads. The test used 2 tables and 80M rows/table.
  • i3.disk - a core i3 Intel NUC with 1 disk, 8gb of RAM and 4 HW threads. The test used 1 table and 160M rows/table.
I repeated tests for MyRocks, InnoDB and TokuDB:
  • I compiled MyRocks on August 15 with git hash 0d76ae. The MyRocks tests were done without compression and with zstd compression (myrocks.none, myrocks.zstd). I did one test for MyRocks with a binary that did not use special instructions to make crc32 faster (myrocks.none.slowcrc) and learned that fast crc doesn't make a difference on this test. It would be a bigger deal for an IO-bound test doing point queries.
  • I used TokuDB from Percona Server 5.7.17. The TokuDB tests were done without compression and with zlib compression. I tried tokudb_disable_prefetching ON and OFF (toku5717.none, toku5717.none.prefetch), but I have been setting this to ON for my OLTP benchmarks because enabling it ruined some OLTP results.
  • I used InnoDB from upstream 5.6.35 and 5.7.17. The performance_schema was enabled. The InnoDB tests did not use compression. 

Results

The results below list the number of seconds to scan the table(s) and the time relative to InnoDB from MySQL 5.6.35. For the relative time a value greater than 1 means the engine is slower than InnoDB. These values are reported for pre and post where pre is the measurement taken immediately after loading the table and creating the secondary index and post is the measurement taken after applying random updates to the table(s).

See tl;dr above for what I learned from these results.

Large server


These are results from 8 tables with 100M rows/table and then 1 table and 800M rows/table on the large server.

48core.ssd - 8t x 100m
pre     pre     post    post    engine
secs    ratio   secs    ratio
221     2.302   246     2.256   myrocks.none
201     2.093   211     1.935   myrocks.zstd
 96     1.000   109     1.000   inno5635
 75     0.781    86     0.788   inno5717
 67     0.697    94     0.862   touk5717.none
 39     0.406    69     0.633   toku5717.none.prefetch
190     1.979   224     2.055   toku5717.zlib

48core.ssd - 1t x 800m
pre     pre     post    post    engine
secs    ratio   secs    ratio
 638    1.065   1032    1.627   myrocks.none
 916    1.529   1063    1.676   myrocks.zstd
 599    1.000    634    1.000   inno5635
 434    0.724    449    0.708   inno5717
 513    0.856    735    1.159   toku5717.none
 249    0.415    502    0.791   toku5717.none.prefetch
1525    2.545   1776    2.801   toku5717.zlib

Intel NUC

These are results from the Intel NUC using SSD and then a disk.

i3.ssd - 2t x 80m
pre     pre     post    post    engine
secs    ratio   secs    ratio
181     1.448   192     1.560   myrocks.none
182     1.456   189     1.536   myrocks.none.slowcrc
219     1.752   238     1.934   myrocks.zstd
125     1.000   123     1.000   inno5635
114     0.912   107     0.869   inno5717

i3.disk - 1t x 160m
pre     pre     post    post    engine
secs    ratio   secs    ratio
330     1.304   348     1.343   myrocks.none
432     1.707   451     1.741   myrocks.zstd
253     1.000   259     1.000   inno5635
257     1.015   261     1.007   inno5717

Charts

Below are charts from the large server tests for 8 tables & 100M rows/table and then 1 table with 800M rows.

Wednesday, September 6, 2017

Timing load & index for sysbench tables

This post compares MyRocks, InnoDB and TokuDB on the time required to load and index a table for sysbench.

tl;dr
  • MyRocks, InnoDB and TokuDB have similar load performance although there is a regression for InnoDB from 5.6 to 5.7 to 8.x
  • InnoDB create index is much faster starting in 5.7

Configuration

I used my sysbench helper scripts with my sysbench branch and configured it to create 1 table with 800M rows. The binlog was enabled but sync on commit was disabled for the binlog and database log. The sysbench client shared the host with mysqld. The host has 48 HW threads, 50gb of RAM for the OS and MySQL and fast SSD. The test table is larger than RAM but it will take me a few days to get details on that. The test was repeated for MyRocks, InnoDB and TokuDB. I continue to use the IO-bound setup as described previously.
  • I compiled MyRocks on August 15 with git hash 0d76ae. The MyRocks tests were done without compression and with zstd compression. 
  • I used TokuDB from Percona Server 5.7.17. The TokuDB tests were done without compression and with zlib compression. I tried tokudb_disable_prefetching ON and OFF, but I have been setting this to ON for my OLTP benchmarks. 
  • I used InnoDB from upstream 5.6.35, 5.7.17, 8.0.1 and 8.0.2. For 8.x I used latin1/latin1_swedish_ci charset/collation. The performance_schema was enabled. The InnoDB tests did not use compression. 

A sample command line for sysbench is:
bash all.sh 1 800000000 180 300 180 innodb 1 0 /bin/mysql none /sysbench10 /dbdir

Results

The load is in PK order and there are no secondary indexes. Engines have similar performance although there is a slow regression for InnoDB with each new release and there is a big regression from 8.0.1 to 8.0.2 which I hope will be fixed when 8.x approaches GA. The ratio is the time to load for the engine divided by the time to load for InnoDB from MySQL 5.6.35.

load    load    engine
secs    ratio
 7266   1.000   inno5635
 7833   1.078   inno5717
 8286   1.140   inno801
10516   1.447   inno802
 7640   1.051   myrocks.none
 7810   1.074   myrocks.zstd
 7558   1.040   toku5717.none
 7494   1.031   toku5717.none.prefetch
 7726   1.063   toku5717.zlib 

Create index performance has more diversity. The table is larger than RAM, some of it will be read from storage and engines with compression (MyRocks.zstd, toku5717.zlib) suffer from decompression latency. The prefetch option doesn't help TokuDB in this case. InnoDB create index performance got much faster starting in 5.7. The ratio is the create index time for the engine divided by the time for InnoDB from MySQL 5.6.35.

create  create  engine
secs    ratio
 3565   1.000   inno5635
 1904   0.534   inno5717
 1961   0.550   inno801
 1966   0.551   inno802
 3321   0.931   myrocks.none
 3802   1.066   myrocks.zstd
 9817   2.753   toku5717.none
 9855   2.764   toku5717.none.prefetch
10731   3.010   toku5717.zlib

Charts

These charts have the same data as the tables above.

Write-heavy workloads with MyRocks

MyRocks is based on RocksDB and RocksDB is write-optimized, so why don't write-heavy workloads always run faster on MyRocks than on InnoDB? I will start with a summary of MyRocks versus InnoDB: MyRocks has better space & write efficiency, frequently has better write latency and sometimes has better read latency. Better space & write efficiency means you use less SSD and it will last longer.

Don't forget that better write and space efficiency with MyRocks can lead to better read latency. When a database engine does fewer writes to storage there is more IO capacity available for reads which is a big deal with disk and can be a big deal with slower SSD. When a database engine uses less space in storage then it is likely to cache more data in RAM and have better cache hit rates on reads. Many of the benchmarks that I run use uniform distribution for key generation and won't show the benefit from better cache hit rates.

With RocksDB write usually means a call to Put and Put is a blind-write. As shown by many benchmark results, a blind write can be very fast with RocksDB - insert data into memtable, optionally flush the WAL to the OS page cache, optionally force the WAL to persistent storage. But SQL update and insert statements usually need much more than a blind-write and the reads done by MyRocks can explain why some write-heavy workloads are faster with InnoDB. Things that get in the way include:
  • Pluggable storage engine APIs are slow to adopt blind-write optimizations
  • The modified row count must be returned from an update statement 
  • Enforcement of the PK and unique constraints
  • Secondary index maintenance
Pluggable storage engine APIs are slow to adopt blind-write optimizations. I don't blame MySQL and MongoDB for this because such optimizations are limited to write-optimized engines. But MyRocks and MongoRocks are here and I expect that WiredTiger/MongoDB will eventually have an LSM in production. M*Rocks engines can use the merge operator for this. I don't expect blind-write optimizations to ever be implemented for an update-in-place b-tree.

The modified row count must be returned for an update statement and that requires evaluation of the where clause. For RocksDB this requires reads -- from the memtable, maybe from the LSM tree, OS page cache and storage. Reads from storage and the OS page cache might require decompression. This is a lot more work than a blind-write. The usage of blind-write optimizations will result in statements that are update-like and insert-like because the application programmers must be aware of the semantic differences -- modified row count won't be returned, constraint violations won't be acknowledged. I think it is worth doing.

Unique constraints must be enforced for the PK and unique secondary indexes. With the exception of an auto-increment column, this requires a read from the index to confirm the value does not exist. In the best case updates & inserts are in key order and the structures to be searched are cached in RAM -- otherwise this requires reads from the OS page cache and/or storage and might require decompression.

Secondary indexes must be maintained as part of the update & insert statement processing. For an insert this means that a new index entry will be inserted. For an update that requires index maintenance the existing index entry will be deleted and a new one will be inserted. With a b-tree the leaf page(s) will be read (possibly from storage), made dirty in the buffer pool and eventually those pages will be written back to storage. With RocksDB non-unique secondary index maintenance is read-free and blind-writes are done to put the new index entry and possibly delete-mark the old index entry. See the previous paragraph if there is a unique secondary index.