IBM SVC / Storwize DRP – Still a no-go

IBM released “Data Reduction Pools” for their SAN Volume Controller, which includes Storwize, FlashSystem etc products back in early 2018. The internal flag for it is “deduplication”, but it can be used for thin, compressed, or deduplicated volumes. You can put thick volumes in them too, but that is more of a “block off this space” thing.

DRP is a “log structured device”, meaning it is random read, but always sequential write. When you enable DRP, it inserts into the high cache (front-end cache). Because of this, it adds 3-4ms of latency right off the bat. Some of the higher end controllers can reduce that to 2-3ms, but it never goes away. Also, this affects ALL storage pools (mdisk groups), not just the ones built with DRP enabled. IBM product engineering says this is not a defect. It’s just they way it is, because you have to twiddle the data on the CPU.

In late 2018, I found one of their code defects, which has been resolved; however, that defect highlighted a troubling design choice. DRP is a whole-cluster thing, not a whole I/O group thing, nor a whole pool thing. That means, if you trigger a defect that kills DRP for one pool, it will kill DRP for every pool. If you’re in a hyperswap or stretched cluster, and something happens (future defect maybe) to take one site’s DRP offline while the nodes are still live, then that could take out BOTH sites.

This year, I found out that their sizing partner, “Intellimagic”, does not understand DRP. When a “Disk Magic” report comes out, it can recommend DRP. It cannot tell how much cache affects things, so it just recommends maximum cache. It cannot tell how much DRP affects things, so it cannot recommend against it for latency sensitive workloads. The IBM technical sales team is trying to work around this by simply recommending the next model up when DRP is to be used.

However, that is not enough. DRP garbage collection is VERY heavy handed. There is no way to throttle it. It’s a log structured array, so when you re-write a block on a LUN, the old block gets invalidated, then the old chunk gets read, then new data is packed in, then it’s written to the end of the array. This is sequential for each major section of the pool, and it has additional latency added by the code. We found that a normal system of 25% writes would be suppressed by 80% in DRP. Latency during normal operations would fluctuate from 6 to 18, and that’s to the client. Any major I/O operations on the client (big rewrites, or big block freeing), or any vdisk (LUN) deletions would cause latency to fluctuate from 16-85ms (completely unusable).

Once DRP is enabled, it affects the whole system. You cannot turn it off for a pool. You have to delete that pool. You cannot migrate a vdisk out of nor into a DRP, you have to mirror to a new pool, then delete or split the old copy. Funny thing is, that’s just what a migrate does at all but the lowest levels. It’s been 2 years, that really should have been folded into the standard commands by now.

Once you have everything split, and all of the hosts are running on the new, non-DRP enabled pool, that is not enough. All of the performance problems will still be there. It is only once you delete the old pool that performance returns to normal. Since vdisk deletions take several hours each, and running more than 4 at a time causes severe, deadly latency, it’s best to just verify nothing is in use and force-delete the whole DRP-enabled pool. Risky, but otherwise, you could spend weeks deleting the old LUNs by hand due to how slow they clear.

The sales materials, technical whitebooks, and the gritty redbooks all recommend DRP as the best thing ever. Staff and books shy away from highlighting any of these limitations. Other products handle deduplication without this sort of negative impact, but here, even with no deduplication in use, just the OPTION of using dedupe, it causes the system to be unusable.

Maybe there will be a “fix” or a “rewrite”, but I’ve been burned hard, with severe customer satisfaction, over two years. This negatively impacts trust, and is not the only IBM storage product causing me trust issues.

I expect that this is the fallout from Ginny Rometty’s command, “Don’t try to protect the past”. IBM’s vision is stated as “Cloud, Data, and Watson”. It will be interesting to see exactly how that translates into a future structure. IBM has been divesting a lot of pieces lately. Rometty ran services, and that’s what she knows as Chair and CEO. Anything she sees as a commodity, or low margin will be spun off. (IBM seems to need at least a 6:1 revenue to cost in order to operate. There are a lot of layers.) I’m starting to wonder if Storage and POWER will be sold off to Lenovo. I see hints, but no roadmaps, nothing concrete.


Owncloud filled /var/lib/mysql!

I installed owncloud, and set it to indexing a pile of files I wanted easier access to.

Well, /var filled, and the DB stopped. :o

I was on Debian Jessie (stable), and needed some updates to continue.

### Expand /var since I'm not ready to move /var/lib/mysql to its on filesystem
lvextend -L 16G /dev/rootvg/hd9
resize2fs /var


### Stop services using mysql
/etc/init.d/apache2 stop


### Dump all databases
mysqldump --all-databases --opt --routines --complete-insert -uroot -p | gzip -9 > /storage/test/mysqldump.2016-03-03.gz
-- Warning: Skipping the data of table mysql.event. Specify the --events option explicitly.


### Drop all databases except mysql and information_schema
tar -czvf /storage/test/mysql_var_minus_innodb.tgz [dm-z]*
mysql -u root -p
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| owncloud           |
| performance_schema |
| phpmyadmin         |
| roundcube          |
| test               |
+--------------------+
7 rows in set (0.00 sec)

mysql> drop database owncloud;
mysql> drop database performance_schema;
mysql> drop database phpmyadmin;
mysql> drop database roundcube;
mysql> drop database test;
mysql> SET GLOBAL innodb_fast_shutdown = 0;
mysql> exit

### Or for the brave
mysql -e "SELECT DISTINCT CONCAT ('DROP DATABASE ',TABLE_SCHEMA,' ;') FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA <> 'mysql' AND TABLE_SCHEMA <> 'information_schema';" | tail -n+2 | mysql -u root -p
mysql -e "SELECT table_name, table_schema, engine FROM information_schema.tables WHERE engine = 'InnoDB';"


### Stop mysql
/etc/init.d/mysql stop

### Remove the InnoDB files
rm /var/lib/mysql/ib*


### changed from jessie to stretch to get MySQL 5.6
### Not quite ready for MariaDB 1x
vi /etc/apt/sources.list
# Standard repo
deb http://ftp.us.debian.org/debian stretch main contrib non-free
deb-src http://ftp.us.debian.org/debian stretch main contrib non-free

### Volatile
deb http://ftp.debian.org/debian/ stretch-updates main contrib non-free
deb-src http://ftp.debian.org/debian/ stretch-updates main contrib non-free

### Debian Backports
deb http://http.debian.net/debian stretch-backports main

### security updates
deb http://security.debian.org/ stretch/updates main contrib non-free
deb-src http://security.debian.org/ stretch/updates main contrib non-free


####################################
apt-get update
apt-get install mysql-server-5.6
apt-get install mysql-server-5.6  ## going from jessie to stretch, so it was a little tweaky


### Increased log and memory size for mysql from defaults (log 25% of buffer pool)
### Changed to barracuda (supports compressed tables)
### Changed to one file per table for various reasons.
vi /etc/mysql/my.conf
[mysqld]
# * InnoDB
# InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.
# Read the manual for more InnoDB related options. There are many!
innodb_file_per_table = ON
innodb_file_format = barracuda
innodb_flush_method=O_DIRECT
innodb_log_file_size=256M
innodb_buffer_pool_size=1G


#####################################
### it recreates the IB files on start
/etc/init.d/mysql start


### Make sure barracuda is set for real
mysql -u root -p
mysql> set global innodb_file_format = 'Barracuda';
mysql> exit


### Import the dump
gunzip < /storage/test/mysqldump.2016-03-03.gz | mysql -u root -p


###########################################################################
###########################################################################
### Repair a problem with MySQL installer / conversion / upgrade
### See http://bugs.mysql.com/bug.php?id=67179
/* 
  temporary fix for problem with windows installer for MySQL 5.6.10 on Windows 7 machines.
  I did the procedure on a clean installed MySql, and it worked for me, at least it stopped
  lines of innodb errors in the log and the use of transient innodb tables. So, do it at
  your own risk..
  
  1. drop these tables from "use mysql":
     innodb_index_stats
     innodb_table_stats
	 slave_master_info
     slave_relay_log_info
     slave_worker_info
	 
  2. delete all .frm & .ibd of the tables above.
  
  3. run this file to recreate the tables above (source five-tables.sql).
  
  4. restart mysqld.
  
  Cheers, 
  CNL
*/

CREATE TABLE `innodb_index_stats` (
  `database_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `table_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `index_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `stat_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `stat_value` bigint(20) unsigned NOT NULL,
  `sample_size` bigint(20) unsigned DEFAULT NULL,
  `stat_description` varchar(1024) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`database_name`,`table_name`,`index_name`,`stat_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0;

CREATE TABLE `innodb_table_stats` (
  `database_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `table_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `n_rows` bigint(20) unsigned NOT NULL,
  `clustered_index_size` bigint(20) unsigned NOT NULL,
  `sum_of_other_index_sizes` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`database_name`,`table_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0;

CREATE TABLE `slave_master_info` (
  `Number_of_lines` int(10) unsigned NOT NULL COMMENT 'Number of lines in the file.',
  `Master_log_name` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL COMMENT 'The name of the master binary log currently being read from the master.',
  `Master_log_pos` bigint(20) unsigned NOT NULL COMMENT 'The master log position of the last read event.',
  `Host` char(64) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT 'The host name of the master.',
  `User_name` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The user name used to connect to the master.',
  `User_password` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The password used to connect to the master.',
  `Port` int(10) unsigned NOT NULL COMMENT 'The network port used to connect to the master.',
  `Connect_retry` int(10) unsigned NOT NULL COMMENT 'The period (in seconds) that the slave will wait before trying to reconnect to the master.',
  `Enabled_ssl` tinyint(1) NOT NULL COMMENT 'Indicates whether the server supports SSL connections.',
  `Ssl_ca` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The file used for the Certificate Authority (CA) certificate.',
  `Ssl_capath` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The path to the Certificate Authority (CA) certificates.',
  `Ssl_cert` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The name of the SSL certificate file.',
  `Ssl_cipher` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The name of the cipher in use for the SSL connection.',
  `Ssl_key` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The name of the SSL key file.',
  `Ssl_verify_server_cert` tinyint(1) NOT NULL COMMENT 'Whether to verify the server certificate.',
  `Heartbeat` float NOT NULL,
  `Bind` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'Displays which interface is employed when connecting to the MySQL server',
  `Ignored_server_ids` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The number of server IDs to be ignored, followed by the actual server IDs',
  `Uuid` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The master server uuid.',
  `Retry_count` bigint(20) unsigned NOT NULL COMMENT 'Number of reconnect attempts, to the master, before giving up.',
  `Ssl_crl` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The file used for the Certificate Revocation List (CRL)',
  `Ssl_crlpath` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The path used for Certificate Revocation List (CRL) files',
  `Enabled_auto_position` tinyint(1) NOT NULL COMMENT 'Indicates whether GTIDs will be used to retrieve events from the master.',
  PRIMARY KEY (`Host`,`Port`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 STATS_PERSISTENT=0 COMMENT='Master Information';

CREATE TABLE `slave_relay_log_info` (
  `Number_of_lines` int(10) unsigned NOT NULL COMMENT 'Number of lines in the file or rows in the table. Used to version table definitions.',
  `Relay_log_name` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL COMMENT 'The name of the current relay log file.',
  `Relay_log_pos` bigint(20) unsigned NOT NULL COMMENT 'The relay log position of the last executed event.',
  `Master_log_name` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL COMMENT 'The name of the master binary log file from which the events in the relay log file were read.',
  `Master_log_pos` bigint(20) unsigned NOT NULL COMMENT 'The master log position of the last executed event.',
  `Sql_delay` int(11) NOT NULL COMMENT 'The number of seconds that the slave must lag behind the master.',
  `Number_of_workers` int(10) unsigned NOT NULL,
  `Id` int(10) unsigned NOT NULL COMMENT 'Internal Id that uniquely identifies this record.',
  PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 STATS_PERSISTENT=0 COMMENT='Relay Log Information';

CREATE TABLE `slave_worker_info` (
  `Id` int(10) unsigned NOT NULL,
  `Relay_log_name` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `Relay_log_pos` bigint(20) unsigned NOT NULL,
  `Master_log_name` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `Master_log_pos` bigint(20) unsigned NOT NULL,
  `Checkpoint_relay_log_name` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `Checkpoint_relay_log_pos` bigint(20) unsigned NOT NULL,
  `Checkpoint_master_log_name` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `Checkpoint_master_log_pos` bigint(20) unsigned NOT NULL,
  `Checkpoint_seqno` int(10) unsigned NOT NULL,
  `Checkpoint_group_size` int(10) unsigned NOT NULL,
  `Checkpoint_group_bitmap` blob NOT NULL,
  PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 STATS_PERSISTENT=0 COMMENT='Worker Information';
###########################################################################
###########################################################################
###########################################################################


### Regenerate performance_schema
mysql_upgrade --force -u root -p


### Make sure tables are okay
mysqlcheck -p


### Grow mysql temporary space to prevent:
#### ERROR 1034 (HY000): Incorrect key file for table 'oc_filecache'; try to repair it
lvextend -L 16G /dev/rootvg/hd1
resize2fs /dev/rootvg/hd1


### Set to compressed tables
# gzipped, the dump is 319MB, and deployed, the one table is 6GB, for read mostly data.
mysql -u root -p
mysql> alter table owncloud.oc_filecache ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
mysql> exit


### Clean up free space
mysql -u root -p
mysql> OPTIMIZE TABLE owncloud.oc_filecache;
mysql> exit


#####################################
### fix roundcube since it was unhappy with some of the updates
apt-get install roundcube;


### Cleanup some old stuff amplified by partial updates
apt-get autoremove


### Reboot since we had a new dbus installed, and apache2 is still down
shutdown -fr now

Compressed Dovecot Maildir on Debian

I just saved a few gigs with this. Figured I need to document this or I’ll never remember. :)

Add this into /etc/dovecot/conf.d/10*
# Enable zlib plugin globally for reading/writing:
mail_plugins = $mail_plugins zlib

# Enable these only if you want compression while saving:
plugin {
zlib_save_level = 6 # 1..9; default is 6
zlib_save = gz # or bz2, xz or lz4
}

Add this into /etc/dovecot/conf.d/20*
protocol imap {
mail_plugins = zlib
}
protocol pop3 {
mail_plugins = zlib
}

Remove extra spaces and leftover courier garbage
rename ‘s/\ /_/g’ /home/jdavis/Maildir/.[a-zA-Z]*
rename ‘s/\__/_/g’ /home/jdavis/Maildir/.[a-zA-Z]*
rename ‘s/\_\./\./g’ /home/jdavis/Maildir/.[a-zA-Z]*
rm -r /home/jdavis/Maildir/courier*
rm -r /home/jdavis/Maildir/.[a-zA-Z]*/courier*

Create the script to compress all maildir files
#!/bin/sh
compress_maildir () {
cd $1
DIRS=`find -maxdepth 2 -type d -name cur`
for dir in $DIRS; do
echo $dir
cd $dir
FILES=`find -type f -name “*,S=*” -not -regex “.*:2,.*Z.*”`
#compress all files
for FILE in $FILES; do
NEWFILE=../tmp/${FILE}
#echo bzip $FILE $NEWFILE
if ! bzip2 -9 $FILE -c > $NEWFILE; then
echo compressing failed
exit -1;
fi
#reset mtime
if ! touch -r $FILE $NEWFILE; then
echo setting time failed
exit -1
fi
done
echo Locking $dir/..
if PID=`/usr/lib/dovecot/maildirlock .. 120`; then
#locking successfull, moving compressed files
for FILE in $FILES; do
NEWFILE=../tmp/${FILE}
if [ -s $FILE ] && [ -s $NEWFILE ]; then
echo mv $FILE $NEWFILE
mv $FILE /tmp
mv $NEWFILE ${FILE}Z
else
echo mv failed
exit -1
fi
done
kill $PID
else
echo lock failed
exit -1
fi
cd – >/dev/null
done
}

Actually RUN the script to compress all maildir files
./compress_maildir /home/jdavis/Maildir/

References
* http://wiki.dovecot.org/Plugins/Zlib
* http://wiki2.dovecot.org/Plugins/Zlib
* http://abma.de/blog/2010/449
* https://bbs.archlinux.org/viewtopic.php?id=36305