ext keeps going read only

During backups, I get this, and the root filesystem goes read only.

I’ve replaced disks, rebuilt filesystems, arrays, LVM, etc.

 

dmesg shows:

[19501.355932] EXT4-fs error (device dm-7): ext4_get_verity_descriptor_location:295: inode #6032: comm dsmc: verity file doesn’t use extents
[19501.403496] Aborting journal on device dm-7-8.
[19501.414969] EXT4-fs (dm-7): Remounting filesystem read-only
[19501.414974] fs-verity (dm-7, inode 6032): Error -117 getting verity descriptor size

 

Find the file by inode:

find / -xdev -inum 6032 -print

 

This showed that there is junk in lost+found from when I had FS corruption.  I rebuilt the root filesystem, but never cleaned out lost+found.  Deleting the damaged files should solve the problem.


Ubuntu LTS UEFI NVME Mirror Boot

This is super touchy, and here is what I did to make it happy and stable.

This does not address if UEFI decides to write to one of these mirrors.  Someone else has a systemd unit to assemble with resync.

In the past, I used someone else’s bypass script, but this was cleaner, and works in 18 and 20 LTS.

 

### Filesystem / Mirror for EFI / UEFI booting:
mdadm raid 1, metadata 1.0
vfat filesystem for /boot/EFI

### Proper GRUB package
apt-get purge grub\*
apt-get install grub-efi
apt-get autoremove

### Settings that seem to not stick
dpkg-reconfigure -p low grub-efi-amd64
Update NVRAM variables to automatically boot into Debian? NO
echo "grub-pc grub2/update_nvram boolean false" | debconf-set-selections
echo "grub-pc grub-efi/install_devices multiselect /dev/md0" | debconf-set-selections

### Grub config
update-grub
grub-install --no-nvram /dev/md0

### UEFI boot list (variables)
[root@tsm2: /root]
/bin/bash# efibootmgr -?
efibootmgr: invalid option -- '?'
efibootmgr version 17
usage: efibootmgr [options]
-a | --active sets bootnum active
-A | --inactive sets bootnum inactive
-b | --bootnum XXXX modify BootXXXX (hex)
-B | --delete-bootnum delete bootnum
-c | --create create new variable bootnum and add to bootorder
-C | --create-only create new variable bootnum and do not add to bootorder
-D | --remove-dups remove duplicate values from BootOrder
-d | --disk disk (defaults to /dev/sda) containing loader
-r | --driver Operate on Driver variables, not Boot Variables.
-e | --edd [1|3|-1] force EDD 1.0 or 3.0 creation variables, or guess
-E | --device num EDD 1.0 device number (defaults to 0x80)
-g | --gpt force disk with invalid PMBR to be treated as GPT
-i | --iface name create a netboot entry for the named interface
-l | --loader name (defaults to "\EFI\ubuntu\grub.efi")
-L | --label label Boot manager display label (defaults to "Linux")
-m | --mirror-below-4G t|f mirror memory below 4GB
-M | --mirror-above-4G X percentage memory to mirror above 4GB
-n | --bootnext XXXX set BootNext to XXXX (hex)
-N | --delete-bootnext delete BootNext
-o | --bootorder XXXX,YYYY,ZZZZ,... explicitly set BootOrder (hex)
-O | --delete-bootorder delete BootOrder
-p | --part part partition containing loader (defaults to 1 on partitioned devices)
-q | --quiet be quiet
-t | --timeout seconds set boot manager timeout waiting for user input.
-T | --delete-timeout delete Timeout.
-u | --unicode | --UCS-2 handle extra args as UCS-2 (default is ASCII)
-v | --verbose print additional information
-V | --version return version and exit
-w | --write-signature write unique sig to MBR if needed
-y | --sysprep Operate on SysPrep variables, not Boot Variables.
-@ | --append-binary-args file append extra args from file (use "-" for stdin)
-h | --help show help/usage

[root@tsm2: /root]
/bin/bash# efibootmgr -v
BootCurrent: 0019
Timeout: 5 seconds
BootOrder: 0005,0006,0007,000C,0019,0018
Boot0000 Startup Menu FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)....ISPH
Boot0001 System Information FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0002 Bios Setup FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0003 3rd Party Option ROM Management FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0004 System Diagnostics FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0005* nvme0_grub HD(1,GPT,e41eb9e0-6606-411a-bb83-bed7577f29b3,0x800,0x8e800)/File(\EFI\ubuntu\grub.efi)
Boot0006* nvme1_grub HD(1,GPT,aa23256a-95c6-4148-b56c-c8861fc7966a,0x800,0x8e800)/File(\EFI\ubuntu\grub.efi)
Boot0007* nvme2_grub HD(1,GPT,1f7f7f5b-2a89-4d87-a617-6ccaf15078dd,0x800,0x8e800)/File(\EFI\ubuntu\grub.efi)
Boot0008 Boot Menu FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0009* Kingston DataTraveler 3.0 408D5CE57214E331293064F6 BBS(USB,USB1,0x900)/PciRoot(0x0)/Pci(0x1d,0x0)......ISPH
Boot000B Network Boot FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot000C* nvme3_grub HD(1,GPT,cb8bc8b4-affc-4765-97c2-72af0c615d44,0x800,0x8e800)/File(\EFI\ubuntu\grub.efi)
Boot000E* IPV6 Network - Aquantia AQtion 10Gbit Network Adapter PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/MAC(88c9b3bfa1e9,0)/IPv6([::]:<->[::]:,0,0)N.....YM....R,Y.....ISPH
Boot0010* IBA GE Slot 00C8 v1550 BBS(Network,Network1,0x0)/PciRoot(0x0)/Pci(0x19,0x0)......ISPH
Boot0011 USB: PciRoot(0x0)/Pci(0x1d,0x0)N.....YM....R,Y.....ISPH
Boot0012 HP Recovery FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0013* hp PLDS DVDRW DU8AESH PciRoot(0x0)/Pci(0x1f,0x2)/Sata(0,0,0)N.....YM....R,Y.....ISPH
Boot0014* hp PLDS DVDRW DU8AESH BBS(CDROM,CDROM1,0x400)/PciRoot(0x0)/Pci(0x1f,0x2)......ISPH
Boot0018* ubuntu HD(1,GPT,e41eb9e0-6606-411a-bb83-bed7577f29b3,0x800,0x8e800)/File(\EFI\ubuntu\shimx64.efi)....ISPH
Boot0019* ubuntu HD(1,GPT,e41eb9e0-6606-411a-bb83-bed7577f29b3,0x800,0x8e800)/File(\EFI\grub\shimx64.efi)....ISPH
Boot001A* IPV4 Network - Aquantia AQtion 10Gbit Network Adapter PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/MAC(88c9b3bfa1e9,0)/IPv4(0.0.0.00.0.0.0,0,0)N.....YM....R,Y.....ISPH

[root@tsm2: /root]
/bin/bash# efibootmgr -B -b 0005
/bin/bash# efibootmgr -B -b 0006
/bin/bash# efibootmgr -B -b 0007
/bin/bash# efibootmgr -B -b 0009
/bin/bash# efibootmgr -B -b 000c
/bin/bash# efibootmgr -B -b 0018
BootCurrent: 0019
Timeout: 5 seconds
BootOrder: 0019
Boot0000 Startup Menu
Boot0001 System Information
Boot0002 Bios Setup
Boot0003 3rd Party Option ROM Management
Boot0004 System Diagnostics
Boot0008 Boot Menu
Boot000B Network Boot
Boot000E* IPV6 Network - Aquantia AQtion 10Gbit Network Adapter
Boot0010* IBA GE Slot 00C8 v1550
Boot0011 USB:
Boot0012 HP Recovery
Boot0013* hp PLDS DVDRW DU8AESH
Boot0014* hp PLDS DVDRW DU8AESH
Boot0019* ubuntu
Boot001A* IPV4 Network - Aquantia AQtion 10Gbit Network Adapter

[root@tsm2: /root]
/bin/bash# efibootmgr -v
BootCurrent: 0019
Timeout: 5 seconds
BootOrder: 0019
Boot0000 Startup Menu FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)....ISPH
Boot0001 System Information FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0002 Bios Setup FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0003 3rd Party Option ROM Management FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0004 System Diagnostics FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0008 Boot Menu FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot000B Network Boot FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot000E* IPV6 Network - Aquantia AQtion 10Gbit Network Adapter PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/MAC(88c9b3bfa1e9,0)/IPv6([::]:<->[::]:,0,0)N.....YM....R,Y.....ISPH
Boot0010* IBA GE Slot 00C8 v1550 BBS(Network,Network1,0x0)/PciRoot(0x0)/Pci(0x19,0x0)......ISPH
Boot0011 USB: PciRoot(0x0)/Pci(0x1d,0x0)N.....YM....R,Y.....ISPH
Boot0012 HP Recovery FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0013* hp PLDS DVDRW DU8AESH PciRoot(0x0)/Pci(0x1f,0x2)/Sata(0,0,0)N.....YM....R,Y.....ISPH
Boot0014* hp PLDS DVDRW DU8AESH BBS(CDROM,CDROM1,0x400)/PciRoot(0x0)/Pci(0x1f,0x2)......ISPH
Boot0019* ubuntu HD(1,GPT,e41eb9e0-6606-411a-bb83-bed7577f29b3,0x800,0x8e800)/File(\EFI\grub\shimx64.efi)....ISPH
Boot001A* IPV4 Network - Aquantia AQtion 10Gbit Network Adapter PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/MAC(88c9b3bfa1e9,0)/IPv4(0.0.0.00.0.0.0,0,0)N.....YM....R,Y.....ISPH

[root@tsm2: /root]
/bin/bash# efibootmgr -c -d /dev/nvme0n1 -L nvme0_grub -l '\EFI\grub\shimx64.efi'
/bin/bash# efibootmgr -c -d /dev/nvme1n1 -L nvme1_grub -l '\EFI\grub\shimx64.efi'
/bin/bash# efibootmgr -c -d /dev/nvme2n1 -L nvme2_grub -l '\EFI\grub\shimx64.efi'
/bin/bash# efibootmgr -c -d /dev/nvme3n1 -L nvme3_grub -l '\EFI\grub\shimx64.efi'
/bin/bash# efibootmgr -o 0005,0006,0007,0009,00019
BootCurrent: 0019
Timeout: 5 seconds
BootOrder: 0005,0006,0007,0009,0019
Boot0000 Startup Menu
Boot0001 System Information
Boot0002 Bios Setup
Boot0003 3rd Party Option ROM Management
Boot0004 System Diagnostics
Boot0005* nvme0_grub
Boot0006* nvme1_grub
Boot0007* nvme2_grub
Boot0008 Boot Menu
Boot0009* nvme3_grub
Boot000B Network Boot
Boot000E* IPV6 Network - Aquantia AQtion 10Gbit Network Adapter
Boot0010* IBA GE Slot 00C8 v1550
Boot0011 USB:
Boot0012 HP Recovery
Boot0013* hp PLDS DVDRW DU8AESH
Boot0014* hp PLDS DVDRW DU8AESH
Boot0019* ubuntu
Boot001A* IPV4 Network - Aquantia AQtion 10Gbit Network Adapter

[root@tsm2: /root]
/bin/bash# efibootmgr -v
BootCurrent: 0019
Timeout: 5 seconds
BootOrder: 0005,0006,0007,0009,0019
Boot0000 Startup Menu FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)....ISPH
Boot0001 System Information FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0002 Bios Setup FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0003 3rd Party Option ROM Management FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0004 System Diagnostics FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0005* nvme0_grub HD(1,GPT,e41eb9e0-6606-411a-bb83-bed7577f29b3,0x800,0x8e800)/File(\EFI\grub\shimx64.efi)
Boot0006* nvme1_grub HD(1,GPT,aa23256a-95c6-4148-b56c-c8861fc7966a,0x800,0x8e800)/File(\EFI\grub\shimx64.efi)
Boot0007* nvme2_grub HD(1,GPT,1f7f7f5b-2a89-4d87-a617-6ccaf15078dd,0x800,0x8e800)/File(\EFI\grub\shimx64.efi)
Boot0008 Boot Menu FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0009* nvme3_grub HD(1,GPT,cb8bc8b4-affc-4765-97c2-72af0c615d44,0x800,0x8e800)/File(\EFI\grub\shimx64.efi)
Boot000B Network Boot FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot000E* IPV6 Network - Aquantia AQtion 10Gbit Network Adapter PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/MAC(88c9b3bfa1e9,0)/IPv6([::]:<->[::]:,0,0)N.....YM....R,Y.....ISPH
Boot0010* IBA GE Slot 00C8 v1550 BBS(Network,Network1,0x0)/PciRoot(0x0)/Pci(0x19,0x0)......ISPH
Boot0011 USB: PciRoot(0x0)/Pci(0x1d,0x0)N.....YM....R,Y.....ISPH
Boot0012 HP Recovery FvVol(cdbb7b35-6833-4ed6-9ab2-57d2acddf6f0)/FvFile(9d8243e8-8381-453d-aceb-c350ee7757ca)......ISPH
Boot0013* hp PLDS DVDRW DU8AESH PciRoot(0x0)/Pci(0x1f,0x2)/Sata(0,0,0)N.....YM....R,Y.....ISPH
Boot0014* hp PLDS DVDRW DU8AESH BBS(CDROM,CDROM1,0x400)/PciRoot(0x0)/Pci(0x1f,0x2)......ISPH
Boot0019* ubuntu HD(1,GPT,e41eb9e0-6606-411a-bb83-bed7577f29b3,0x800,0x8e800)/File(\EFI\grub\shimx64.efi)....ISPH
Boot001A* IPV4 Network - Aquantia AQtion 10Gbit Network Adapter PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/MAC(88c9b3bfa1e9,0)/IPv4(0.0.0.00.0.0.0,0,0)N.....YM....R,Y.....ISPH

 

[root@tsm2: /root]
/bin/bash# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.3 LTS
Release: 20.04
Codename: focal

[root@tsm2: /root]
/bin/bash# uname -a
Linux tsm2 5.4.0-97-generic #110-Ubuntu SMP Thu Jan 13 18:22:13 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux


lilo slow boot map

Happily switched my boot back to SATA mirrors, and was able to reenable LILO COMPACT mode.

This means instead of 13,000 reads per boot file, it’s more like 50. Not only is booting a few seconds faster, more importantly updating the liko boot map after installing a new kernel takes 10 seconds instead of 5 minutes.


Supermicro BMC firmware

Setting up the new replica server, I kept running into problems during the initial firmware update.  All IPMI settings and hardware data were inaccessible after BMC firmware update. System otherwise works as expected.  This condition persisted no matter if I went to DEL setup or F11 boot menu.  I could hang out in the UEFI shell, or etc.

It was not that the sensor data was “not present”.  It was that the list of sensors was missing.  Also, System MAC address, and BIOS version info was missing.  The FRU data was empty, as well as the Hardware Information.  All of the IPMI settings were blank, and could not be set.  The diagnostic data page gave “File not found”.  The iKVM was inaccessible, and the system could not be put into maintenance mode, powered off, powered on, or reset from the IPMI interface.  All of the system logs were blank and inaccessible.  Support was not super helpful, but they were responsive.  Supermicro is one of the top tier system makers.  They OEM for IBM, but that equipment is not quite as touchy.

The solution is to be very finicky about the BMC firmware update.

Get the right version for your system here:
https://www.supermicro.com/support/resources/bios_ipmi.php

My system used this code:
https://www.supermicro.com/Bios/softfiles/12085/X11SDVN_BIOS_1_3a_IPMI_1_31_03.zip
Manufacturer Name: Supermicro
Product PartNum: SYS-E300-9D
Chassis Part Number: CSE-E300
Board Product Name: X11SDV-4C-TLN2F
BIOS Vendor: American Megatrends Inc.
Processor: Intel(R) Xeon(R) D-2123IT CPU @ 2.20GHz

Update the code with extra patience
Use the AwUpdate utility to update the IPMI/BMC firmware
.\AwUpdate -f ......\WS_X11AST2500_131_03.bin -i lan -h 192.168.1.210 -u ADMIN -p ADMIN
NOTE: This can be on some other system as long as you can connect TCP between the two.
NOTE: No -r, and in the web UI, we would uncheck all of the “preserve settings” options.

Let all 5 parts (0 through 4) complete
Wait for the “New firmware is updating” to complete
Wait for the system to reboot.

Monitor the console
Wait for a longer version of IPMI Initialization
Wait for a longer than usual DXE — ACPI Initialization
Wait for the red LED to come on
Wait at least 5 more minutes (try 10)

At this point, you should see that it responds to F11 or DEL, but stays hung.
CTRL-ALT-DEL and everything should be populated and working.

The Unit IDentity LED may be stuck red.
You cannot clear the UID red state any way other than pulling the power cord.
Let it drain for 30 seconds, and plug back in.

After this, everything works, AND the UID LED setting in the IPMI web interface will switch from blink blue to off.


tsm server status

I ordered the new backup server on October 27.
Initial setup gave app crashes intermittently, so was not ready to make it live yet.
I ran BOINC on it for a day, and at one point, all tasks died at once.

Syslog showed EDAC errors starting 11 days after I got the system, calling out CPU#1Channel#2_DIMM#0

This matches CPU1, DIMM1 on the board (ie, DIMMs are ordered backwards in Linux from printed labels).

I swapped all of CPU1 DIMMS with CPU0 DIMMs to troubleshoot.

Problem went away. 99% chance this was just a slightly loose DIMM from shipping.

Aside from that, the system has been awesome. I’ve run DB2, Spectrum Protect, and BOINC on here. For BOINC, the fans stay on low at 66% and 50% on a warm day, and 66%/66% on a cool day.

TLDR – remember to re-seat your DIMMs after shipping. System is stable otherwise.

Here are logs and system queries:

Nov 7 15:00:43 tsm kernel: [929582.997825] EDAC MC1: 1 CE error on CPU#1Channel#2_DIMM#0 (channel:2 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)
...
Nov 14 19:59:05 tsm kernel: [1552272.728748] EDAC MC1: 7112 CE error on CPU#1Channel#2_DIMM#0 (channel:2 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)

/bin/bash# ll -d /sys/devices/system/edac/mc/mc1/dimm*
drwxr-xr-x 3 root root 0 Nov 14 20:07 /sys/devices/system/edac/mc/mc1/dimm0/
drwxr-xr-x 3 root root 0 Nov 14 20:07 /sys/devices/system/edac/mc/mc1/dimm3/
drwxr-xr-x 3 root root 0 Nov 14 20:07 /sys/devices/system/edac/mc/mc1/dimm6/

/bin/bash# cat /sys/devices/system/edac/mc/mc1/dimm6/dimm_label
CPU#1Channel#2_DIMM#0

/bin/bash# cat /sys/devices/system/edac/mc/mc1/dimm6/dimm_location
channel 2 slot 0

/bin/bash# cat /sys/devices/system/edac/mc/mc1/dimm6/dimm_mem_type
Registered-DDR3

/bin/bash# cat /sys/devices/system/edac/mc/mc1/dimm6/size
8192

/bin/bash# cat /sys/devices/system/edac/mc/mc1/mc_name
i7 core #1

/bin/bash# cat /sys/devices/system/edac/mc/mc1/ce_count
1197602807

/bin/bash# cat /sys/devices/system/edac/mc/mc0/mc_name
i7 core #0

/bin/bash# cat /sys/devices/system/edac/mc/mc0/ce_count
0

/bin/bash# uptime
20:15:26 up 17 days, 23:28, 2 users, load average: 0.01, 0.40, 2.64

Power off and back on, and now BIOS shows:

209-Memory warning condition (WARN_DQS_TEST) detected slot CPU1 DIMM1
209-Memory warning condition (WARN_DQS_TEST) detected slot CPU1 DIMM1
209-Memory warning condition (rd dq dqs) detected slot CPU1 DIMM1
203-Memory module failed self-test and failing rank was disabled slot CPU1 DIMM1

The following configuration options were automatically updated:
Memory:40960 MB

Using ESD precautions, I moved all DIMMs from CPU1 bank to CPU0 bank.
All errors went away.

Loose DIMM. False alarm.


Protect initial install

This is happiness…

tsminst1@tsm:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial

/bin/bash# for i in /dev/sd? ; do smartctl -a $i ; done | grep ‘Device Model’
Device Model: Samsung SSD 850 EVO 250GB
Device Model: WDC WD30EFRX-68EUZN0
Device Model: Samsung SSD 850 EVO 250GB
Device Model: WDC WD30EFRX-68EUZN0
Device Model: WDC WD30EFRX-68EUZN0

tsminst1@tsm:~$ dsmserv format dbdir=/tsm/db01,/tsm/db02,/tsm/db03,/tsm/db04,/tsm/db05,/tsm/db06,/tsm/db07,/tsm/db08 \
> activelogsize=8192 activelogdirectory=/tsm/log archlogdirectory=/tsm/logarch

ANR7800I DSMSERV generated at 11:32:48 on Sep 19 2017.

IBM Spectrum Protect for Linux/x86_64
Version 8, Release 1, Level 3.000

Licensed Materials – Property of IBM

(C) Copyright IBM Corporation 1990, 2017.
All rights reserved.
U.S. Government Users Restricted Rights – Use, duplication or disclosure
restricted by GSA ADP Schedule Contract with IBM Corporation.

ANR7801I Subsystem process ID is 29286.
ANR0900I Processing options file /home/tsminst1/dsmserv.opt.
ANR0010W Unable to open message catalog for language en_US.UTF-8. The default language message catalog will be used.
ANR7814I Using instance directory /home/tsminst1.
ANR3339I Default Label in key data base is TSM Server SelfSigned SHA Key.
ANR4726I The ICC support module has been loaded.
ANR0152I Database manager successfully started.
ANR2976I Offline DB backup for database TSMDB1 started.
ANR2974I Offline DB backup for database TSMDB1 completed successfully.
ANR0992I Server’s database formatting complete.
ANR0369I Stopping the database manager because of a server shutdown.


New data protection

Upgrading TSM server from Q9650 Core 2 Quad 3.0GHz, 8GB DDR2 on Win 2008R2.

New system is HP Z600, two-socket, 6-core 2.66GHz Xeon X5650 and 48GB of RAM. Wattage is the same per socket, but two sockets now. 3x the cores, 4x the performance.

SSDs for DB and Log are also moving to EVO 850 from Corsair M100. I’ll set up a container pool to replace the dedupe file class, and put that on 3x 3TB RAID5 instead of 2x RAID1.

OS will be Ubuntu 16.04.2 LTS. I’d like to just use Debian 9.1, but Debian and long-term-support seem to not be synonymous. I’d hate to run a patch update and have everything break, then fight with debian testing repo to try to get it all back to normal. Plus, I have no Ubuntu boxes, only Debian. It’ll give me a chance to see what operational differences I run into.

Old TSM is 6.4. New will be “Spectrum Protect” 8.1.3. Yes, the billions spent to rebrand to the same name as Charter Cable’s rebrand really seems like money well spent.

Anyway, Since I lost the offsite replication provider for the dedupe file pool, and it was having trouble keeping up anyway, this will let me change to server-side encryption, and object storage. We’ll see which provider wins out on price once everything is rededuped properly.

If the fan noise is not too bad, maybe this platform can be considered for a low-cost upgrade to the kids’ game machines. Though, these are heavy, with 2 big handles on the top.

Also, really, something new enough to have USB3 on the motherboard is probably better. I have some laptops picked out, but that’s re-buying every component, including ones that are presently decent. *sigh*


Docker Debian autoinstall fails

Debian (and Ubuntu and others) use apt, aptitude, apt-get, and dpkg. apt currently requires the Release keys to match in a complex way. Mondo, Docker, and many other projects have problems making a repo actually work. The telltale failure is similar to this:

W: The repository 'https://apt.dockerproject.org/repo debian-stretch Release' does not have a Release file.
N: Data from such a repository can't be authenticated and is therefore potentially dangerous to use.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: Failed to fetch https://apt.dockerproject.org/repo/dists/debian-stretch/testing/binary-i386/Packages
E: Some index files failed to download. They have been ignored, or old ones used instead.
[root@ns1:/etc/apt/sources.list.d]

 

You can manually work around this by changing your sources.list to use HTTP instead of HTTPS, but scripts such Ubiquiti’s Universal Network Management Server installer will replace that:

curl -fsSL https://raw.githubusercontent.com/Ubiquiti-App/UNMS/master/install.sh > /tmp/unms_install.sh && sudo bash /tmp/unms_install.sh
branch=master
version=0.10.3
Downloading installation package for version 0.10.3.
Setting VERSION=0.10.3
Download and install Docker
# Executing docker install script, commit: 490beaa
+ sh -c 'apt-get update -qq >/dev/null'
+ sh -c 'apt-get install -y -qq apt-transport-https ca-certificates curl software-properties-common >/dev/null'
+ sh -c 'curl -fsSL "https://download.docker.com/linux/debian/gpg" | apt-key add -qq - >/dev/null'
Warning: apt-key output should not be parsed (stdout is not a terminal)
+ sh -c 'echo "deb [arch=amd64] https://download.docker.com/linux/debian stretch edge" > /etc/apt/sources.list.d/docker.list'
+ '[' debian = debian ']'
+ '[' stretch = wheezy ']'
+ sh -c 'apt-get update -qq >/dev/null'
W: The repository 'https://download.docker.com/linux/debian stretch Release' does not have a Release file.
E: Failed to fetch https://download.docker.com/linux/debian/dists/stretch/edge/binary-amd64/Packages
E: Some index files failed to download. They have been ignored, or old ones used instead.

 

A more stable workaround is to force apt back into the old mode of not caring if the Release certs are perfectly matched to the file server:

cat <<'EOF' >>/etc/apt/apt.conf.d/01docker
Acquire::https::apt.dockerproject.org::Verify-Peer "false";
Acquire::https::download.docker.com::Verify-Peer "false";
EOF

 

Now, the install works fine:

curl -fsSL https://raw.githubusercontent.com/Ubiquiti-App/UNMS/master/install.sh > /tmp/unms_install.sh \
  && sudo bash /tmp/unms_install.sh
branch=master
version=0.10.3
Downloading installation package for version 0.10.3.
Setting VERSION=0.10.3
Download and install Docker
# Executing docker install script, commit: 490beaa
+ sh -c 'apt-get update -qq >/dev/null'
+ sh -c 'apt-get install -y -qq apt-transport-https ca-certificates curl software-properties-common >/dev/null'
+ sh -c 'curl -fsSL "https://download.docker.com/linux/debian/gpg" | apt-key add -qq - >/dev/null'
Warning: apt-key output should not be parsed (stdout is not a terminal)
+ sh -c 'echo "deb [arch=amd64] https://download.docker.com/linux/debian stretch edge" > /etc/apt/sources.list.d/docker.list'
+ '[' debian = debian ']'
+ '[' stretch = wheezy ']'
+ sh -c 'apt-get update -qq >/dev/null'
+ sh -c 'apt-get install -y -qq docker-ce >/dev/null'
+ sh -c 'docker version'
Client:
 Version:      17.09.0-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:42:09 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.0-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:40:48 2017
 OS/Arch:      linux/amd64
 Experimental: false
If you would like to use Docker as a non-root user, you should now consider
adding your user to the "docker" group with something like:

  sudo usermod -aG docker your-user

Remember that you will have to log out and back in for this to take effect!

WARNING: Adding a user to the "docker" group will grant the ability to run
         containers which can be used to obtain root privileges on the
         docker host.
         Refer to https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
         for more information.
Docker version: 17.09.0
./install-full.sh: line 470: ((: 17 < 1
        || (17 == 1 && 09: value too great for base (error token is "09")
Download and install Docker compose.
Docker Compose version: 1.9
Creating user unms.
Skipping 0.8.0 permission fix
Preparing templates
Creating docker-compose.yml
Pulling docker images.
Pulling redis (redis:3.2.8-alpine)...
3.2.8-alpine: Pulling from library/redis
cfc728c1c558: Pull complete
8eda5cfd7e0a: Pull complete
8acb752a319b: Pull complete
955021cea791: Pull complete
d301d906247c: Pull complete
ff438d9e11c6: Pull complete
Digest: sha256:262d8bd214e74cebb3a0573e0f3a042aa3ddade36cf39a4891dd1b05b636bc55
Status: Downloaded newer image for redis:3.2.8-alpine
Pulling postgres (postgres:9.6.1-alpine)...
9.6.1-alpine: Pulling from library/postgres
0a8490d0dfd3: Pull complete
b6475055d17e: Pull complete
ba55801edf3d: Pull complete
f132014bbab8: Pull complete
9775497ec4a5: Pull complete
678be380896e: Pull complete
31e4998cc9ec: Pull complete
Digest: sha256:fa48df82694141793fb0cd52b9a93a3618ba03e5814e11dbf0dd43797f4d4cf7
Status: Downloaded newer image for postgres:9.6.1-alpine
Pulling rabbitmq (rabbitmq:3)...
3: Pulling from library/rabbitmq
bc95e04b23c0: Pull complete
2e65f0b00e4c: Pull complete
f2bd80317989: Pull complete
7b05ca830283: Pull complete
0bb5a4bbcce5: Pull complete
cf840d8999f6: Pull complete
be339ca44883: Pull complete
ce35cd9f9b5b: Pull complete
a4fe32a0a00d: Pull complete
77408ca9e94e: Pull complete
db03407a1aba: Pull complete
Digest: sha256:9a0de56d27909c518f448314d430f8eda3ad479fc459d908ff8b281c4dfc1c00
Status: Downloaded newer image for rabbitmq:3
Pulling unms (ubnt/unms:0.10.3)...
0.10.3: Pulling from ubnt/unms
627beaf3eaaf: Pull complete
5fc32359ecb8: Pull complete
2b99ae07dd66: Pull complete
99c9d1420b38: Pull complete
b65b0ba413b8: Pull complete
86bd816c9566: Pull complete
32ebfd822bb4: Pull complete
Digest: sha256:5dc99a77ee8bb4d09f02da715ec3142283ce44d5e91b8f515b5694ffb25d6c3c
Status: Downloaded newer image for ubnt/unms:0.10.3
Checking available ports
Port 80 is already in use, please choose a different HTTP port for UNMS. [8080]:
Port 8080 is already in use, please choose a different HTTP port for UNMS. [8080]: 8888
Port 443 is already in use, please choose a different HTTPS port for UNMS. [8443]:
Port 8443 is already in use, please choose a different HTTPS port for UNMS. [8443]: 8883
Creating data volumes.
Will mount /home/unms/data
Creating docker-compose.yml
Deploying templates
Writing config file
no crontab for unms
no crontab for unms
Deleting obsolete firmwares...
Downloading new firmwares...
Downloading e50-1.9.7-hotfix.3.170831.tar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 74.6M  100 74.6M    0     0  5502k      0  0:00:13  0:00:13 --:--:-- 5870k
Downloading e100-1.9.7-hotfix.3.170831.tar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 80.8M  100 80.8M    0     0  5692k      0  0:00:14  0:00:14 --:--:-- 5859k
Downloading e200-1.9.7-hotfix.3.170831.tar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 80.7M  100 80.7M    0     0  5725k      0  0:00:14  0:00:14 --:--:-- 5873k
Downloading e1000-1.9.7-hotfix.3.170831.tar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 81.7M  100 81.7M    0     0  5705k      0  0:00:14  0:00:14 --:--:-- 5867k
Downloading e600-1.0.2.170728.tar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 86.8M  100 86.8M    0     0  5738k      0  0:00:15  0:00:15 --:--:-- 5871k
Downloading SFU-1.2.0.171003.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15.0M  100 15.0M    0     0  4663k      0  0:00:03  0:00:03 --:--:-- 4664k
Downloading XC-8.3.2.170901.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 9046k  100 9046k    0     0  5219k      0  0:00:01  0:00:01 --:--:-- 5216k
Downloading XC-8.3.2-cs.170901.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 9046k  100 9046k    0     0  5218k      0  0:00:01  0:00:01 --:--:-- 5219k
Downloading WA-8.3.2.170901.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 9028k  100 9028k    0     0  5327k      0  0:00:01  0:00:01 --:--:-- 5329k
Downloading WA-8.3.2-cs.170901.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 9028k  100 9028k    0     0  5006k      0  0:00:01  0:00:01 --:--:-- 5004k
Downloading TI-6.0.7.170908.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7083k  100 7083k    0     0  4917k      0  0:00:01  0:00:01 --:--:-- 4915k
Downloading TI-6.0.7-cs.170908.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7083k  100 7083k    0     0  5181k      0  0:00:01  0:00:01 --:--:-- 5185k
Downloading XM-6.0.7.170908.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7389k  100 7389k    0     0  5218k      0  0:00:01  0:00:01 --:--:-- 5222k
Downloading XM.6.0.7-cs.170908.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7389k  100 7389k    0     0  5963k      0  0:00:01  0:00:01 --:--:-- 5959k
Downloading XW.v6.0.7.170908.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7227k  100 7227k    0     0  5224k      0  0:00:01  0:00:01 --:--:-- 5225k
Downloading XW-6.0.7-cs.170908.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7227k  100 7227k    0     0  5075k      0  0:00:01  0:00:01 --:--:-- 5071k
Starting docker containers.
Creating network "unms_internal" with the default driver
Creating network "unms_public" with the default driver
Building fluentd
Step 1/6 : FROM fluent/fluentd:v0.12-latest
v0.12-latest: Pulling from fluent/fluentd
019300c8a437: Pull complete
d30279f73a02: Pull complete
fd39bd5a5dae: Pull complete
4dacb8d2bb26: Pull complete
963e933724db: Pull complete
8b4dd4e99009: Pull complete
59bedb222c2c: Pull complete
Digest: sha256:9b10ed70251fda1cd91c92f07a3ae74059adb1bdad6fc51cfcfe42272a9e78e8
Status: Downloaded newer image for fluent/fluentd:v0.12-latest
 ---> 4fce39752458
Step 2/6 : USER root
 ---> Running in 8f315349c16e
 ---> 84398611a0ad
Removing intermediate container 8f315349c16e
Step 3/6 : COPY entrypoint.sh /
 ---> 157af3140182
Step 4/6 : RUN apk add --no-cache --update su-exec     && apk add --no-cache dumb-init --repository http://dl-cdn.alpinelinux.org/alpine/edge/community/     && chmod +x /entrypoint.sh
 ---> Running in fbdef19d9e1a
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
OK: 27 MiB in 24 packages
fetch http://dl-cdn.alpinelinux.org/alpine/edge/community/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
OK: 27 MiB in 24 packages
 ---> e82e4e7e156f
Removing intermediate container fbdef19d9e1a
Step 5/6 : ENTRYPOINT /entrypoint.sh
 ---> Running in 3a0455e845ef
 ---> 7581bd63c44f
Removing intermediate container 3a0455e845ef
Step 6/6 : CMD fluentd -c /fluentd/etc/$FLUENTD_CONF -p /fluentd/plugins $FLUENTD_OPT
 ---> Running in 13c6baad173b
 ---> 97647e174228
Removing intermediate container 13c6baad173b
Successfully built 97647e174228
Successfully tagged unms_fluentd:latest
WARNING: Image for service fluentd was built because it did not already exist.
 To rebuild this image you must use `docker-compose build` or `docker-compose up --build`.
Creating unms-fluentd
Creating unms-redis
Creating unms-rabbitmq
Creating unms-postgres
Creating unms
Removing old images
Current image: ubnt/unms:0.10.3
All UNMS images: ubnt/unms:0.10.3
Images to remove: ''
No old images found
Waiting for UNMS to start
CONTAINER ID      IMAGE                   COMMAND                  CREATED             STATUS           PORTS                                            NAMES
6e814af4ffc5      ubnt/unms:0.10.3        "/usr/bin/dumb-ini..."   8 seconds ago       Up 3 seconds     0.0.0.0:8888->8080/tcp, 0.0.0.0:8883->8443/tcp   unms
01f61e7d9ae8      postgres:9.6.1-alpine   "/docker-entrypoin..."   10 seconds ago      Up 7 seconds                                                      unms-postgres
99261993de75      rabbitmq:3              "docker-entrypoint..."   10 seconds ago      Up 6 seconds                                                      unms-rabbitmq
21bb0d5db0e1      redis:3.2.8-alpine      "docker-entrypoint..."   10 seconds ago      Up 7 seconds                                                      unms-redis
cdb0b878b633      unms_fluentd            "/entrypoint.sh /b..."   11 seconds ago      Up 1 second      5140/tcp, 127.0.0.1:24224->24224/tcp             unms-fluentd
UNMS is running


Omnitech DP server lives

10 days ago, the drive enclosure for the TSM server failed during a storm. The enclosure is an RSV-S5 from 2010. The PSU died, and seems to be a specialty part. The part costs $250. A newer version of the enclosure $180 from Sans Digital. This is a bulk data server, so a 4-pay box was fine. I picked up a Mediasonic Probox 4-bay JBOD with ESATA and USB3 ports. It’s a faster port multiplier, better functionality, and half the volume on the server shelf.

I still plan to migrate everything to Linux on Spectrum Protect 8, with container pools, and maybe use glacier for off-site storage. This is compounded by CrashPlan ditching their non-business plans, and never being able to sync anyway. I really need a better way to store off-site DR data. BOX for a critical chunk is okay. Google and Dropbox for active data is okay. But for an off-site DR pool, it would be too expensive to put into either of those. Plus, SP8 is chunk aware much better. I’d hate for a CDP product to revert a chunk, or be constantly out of sync.


Posted in News | Comments Off on Omnitech DP server lives

SATA chipset reference

The SIL3132 card (SATA-II, PCIe 1.0) ran at 122MB/sec.

The 88SE9128 card (SATA-III, PCIe 2.0) ran at 75MB/sec, or 35MB/sec with FIS disabled.

The 88SE9235 card runs at 195MB/sec.

My two test enclosures are:
* SIL3726 based enclosure (RSV-5S)
* 88SM9715 based enclosure (TR5M6G)
* Linux, MDADM, RAID6, sequential read, 256k blocks.

Ableton said I should go with a single SSD behind a JMS575 port multiplier to get best performance out of the 88SE9128.

I pointed out that a single drive is not the same as multiple (switching delays),
and that replacing all of my spinning disks with SSD is not a valid solution.


Posted in News, Reference | Comments Off on SATA chipset reference