tsm server status

I ordered the new backup server on October 27.
Initial setup gave app crashes intermittently, so was not ready to make it live yet.
I ran BOINC on it for a day, and at one point, all tasks died at once.

Syslog showed EDAC errors starting 11 days after I got the system, calling out CPU#1Channel#2_DIMM#0

This matches CPU1, DIMM1 on the board (ie, DIMMs are ordered backwards in Linux from printed labels).

I swapped all of CPU1 DIMMS with CPU0 DIMMs to troubleshoot.

Problem went away. 99% chance this was just a slightly loose DIMM from shipping.

Aside from that, the system has been awesome. I’ve run DB2, Spectrum Protect, and BOINC on here. For BOINC, the fans stay on low at 66% and 50% on a warm day, and 66%/66% on a cool day.

TLDR – remember to re-seat your DIMMs after shipping. System is stable otherwise.

Here are logs and system queries:

Nov 7 15:00:43 tsm kernel: [929582.997825] EDAC MC1: 1 CE error on CPU#1Channel#2_DIMM#0 (channel:2 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)
...
Nov 14 19:59:05 tsm kernel: [1552272.728748] EDAC MC1: 7112 CE error on CPU#1Channel#2_DIMM#0 (channel:2 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)

/bin/bash# ll -d /sys/devices/system/edac/mc/mc1/dimm*
drwxr-xr-x 3 root root 0 Nov 14 20:07 /sys/devices/system/edac/mc/mc1/dimm0/
drwxr-xr-x 3 root root 0 Nov 14 20:07 /sys/devices/system/edac/mc/mc1/dimm3/
drwxr-xr-x 3 root root 0 Nov 14 20:07 /sys/devices/system/edac/mc/mc1/dimm6/

/bin/bash# cat /sys/devices/system/edac/mc/mc1/dimm6/dimm_label
CPU#1Channel#2_DIMM#0

/bin/bash# cat /sys/devices/system/edac/mc/mc1/dimm6/dimm_location
channel 2 slot 0

/bin/bash# cat /sys/devices/system/edac/mc/mc1/dimm6/dimm_mem_type
Registered-DDR3

/bin/bash# cat /sys/devices/system/edac/mc/mc1/dimm6/size
8192

/bin/bash# cat /sys/devices/system/edac/mc/mc1/mc_name
i7 core #1

/bin/bash# cat /sys/devices/system/edac/mc/mc1/ce_count
1197602807

/bin/bash# cat /sys/devices/system/edac/mc/mc0/mc_name
i7 core #0

/bin/bash# cat /sys/devices/system/edac/mc/mc0/ce_count
0

/bin/bash# uptime

20:15:26 up 17 days, 23:28,  2 users,  load average: 0.01, 0.40, 2.64

Power off and back on, and now BIOS shows:

209-Memory warning condition (WARN_DQS_TEST) detected slot CPU1 DIMM1
209-Memory warning condition (WARN_DQS_TEST) detected slot CPU1 DIMM1
209-Memory warning condition (rd dq dqs) detected slot CPU1 DIMM1
203-Memory module failed self-test and failing rank was disabled slot CPU1 DIMM1

The following configuration options were automatically updated:

Memory:40960 MB


Using ESD precautions, I moved all DIMMs from CPU1 bank to CPU0 bank.
All errors went away.

Loose DIMM. False alarm.


Protect initial install

This is happiness…

tsminst1@tsm:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial

/bin/bash# for i in /dev/sd? ; do smartctl -a $i ; done | grep ‘Device Model’
Device Model: Samsung SSD 850 EVO 250GB
Device Model: WDC WD30EFRX-68EUZN0
Device Model: Samsung SSD 850 EVO 250GB
Device Model: WDC WD30EFRX-68EUZN0
Device Model: WDC WD30EFRX-68EUZN0

tsminst1@tsm:~$ dsmserv format dbdir=/tsm/db01,/tsm/db02,/tsm/db03,/tsm/db04,/tsm/db05,/tsm/db06,/tsm/db07,/tsm/db08 \
> activelogsize=8192 activelogdirectory=/tsm/log archlogdirectory=/tsm/logarch

ANR7800I DSMSERV generated at 11:32:48 on Sep 19 2017.

IBM Spectrum Protect for Linux/x86_64
Version 8, Release 1, Level 3.000

Licensed Materials – Property of IBM

(C) Copyright IBM Corporation 1990, 2017.
All rights reserved.
U.S. Government Users Restricted Rights – Use, duplication or disclosure
restricted by GSA ADP Schedule Contract with IBM Corporation.

ANR7801I Subsystem process ID is 29286.
ANR0900I Processing options file /home/tsminst1/dsmserv.opt.
ANR0010W Unable to open message catalog for language en_US.UTF-8. The default language message catalog will be used.
ANR7814I Using instance directory /home/tsminst1.
ANR3339I Default Label in key data base is TSM Server SelfSigned SHA Key.
ANR4726I The ICC support module has been loaded.
ANR0152I Database manager successfully started.
ANR2976I Offline DB backup for database TSMDB1 started.
ANR2974I Offline DB backup for database TSMDB1 completed successfully.
ANR0992I Server’s database formatting complete.
ANR0369I Stopping the database manager because of a server shutdown.


New data protection

Upgrading TSM server from Q9650 Core 2 Quad 3.0GHz, 8GB DDR2 on Win 2008R2.

New system is HP Z600, two-socket, 6-core 2.66GHz Xeon X5650 and 48GB of RAM. Wattage is the same per socket, but two sockets now. 3x the cores, 4x the performance.

SSDs for DB and Log are also moving to EVO 850 from Corsair M100. I’ll set up a container pool to replace the dedupe file class, and put that on 3x 3TB RAID5 instead of 2x RAID1.

OS will be Ubuntu 16.04.2 LTS. I’d like to just use Debian 9.1, but Debian and long-term-support seem to not be synonymous. I’d hate to run a patch update and have everything break, then fight with debian testing repo to try to get it all back to normal. Plus, I have no Ubuntu boxes, only Debian. It’ll give me a chance to see what operational differences I run into.

Old TSM is 6.4. New will be “Spectrum Protect” 8.1.3. Yes, the billions spent to rebrand to the same name as Charter Cable’s rebrand really seems like money well spent.

Anyway, Since I lost the offsite replication provider for the dedupe file pool, and it was having trouble keeping up anyway, this will let me change to server-side encryption, and object storage. We’ll see which provider wins out on price once everything is rededuped properly.

If the fan noise is not too bad, maybe this platform can be considered for a low-cost upgrade to the kids’ game machines. Though, these are heavy, with 2 big handles on the top.

Also, really, something new enough to have USB3 on the motherboard is probably better. I have some laptops picked out, but that’s re-buying every component, including ones that are presently decent. *sigh*


Docker Debian autoinstall fails

Debian (and Ubuntu and others) use apt, aptitude, apt-get, and dpkg. apt currently requires the Release keys to match in a complex way. Mondo, Docker, and many other projects have problems making a repo actually work. The telltale failure is similar to this:

W: The repository 'https://apt.dockerproject.org/repo debian-stretch Release' does not have a Release file.
N: Data from such a repository can't be authenticated and is therefore potentially dangerous to use.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: Failed to fetch https://apt.dockerproject.org/repo/dists/debian-stretch/testing/binary-i386/Packages
E: Some index files failed to download. They have been ignored, or old ones used instead.
[root@ns1:/etc/apt/sources.list.d]

 

You can manually work around this by changing your sources.list to use HTTP instead of HTTPS, but scripts such Ubiquiti’s Universal Network Management Server installer will replace that:

curl -fsSL https://raw.githubusercontent.com/Ubiquiti-App/UNMS/master/install.sh > /tmp/unms_install.sh && sudo bash /tmp/unms_install.sh
branch=master
version=0.10.3
Downloading installation package for version 0.10.3.
Setting VERSION=0.10.3
Download and install Docker
# Executing docker install script, commit: 490beaa
+ sh -c 'apt-get update -qq >/dev/null'
+ sh -c 'apt-get install -y -qq apt-transport-https ca-certificates curl software-properties-common >/dev/null'
+ sh -c 'curl -fsSL "https://download.docker.com/linux/debian/gpg" | apt-key add -qq - >/dev/null'
Warning: apt-key output should not be parsed (stdout is not a terminal)
+ sh -c 'echo "deb [arch=amd64] https://download.docker.com/linux/debian stretch edge" > /etc/apt/sources.list.d/docker.list'
+ '[' debian = debian ']'
+ '[' stretch = wheezy ']'
+ sh -c 'apt-get update -qq >/dev/null'
W: The repository 'https://download.docker.com/linux/debian stretch Release' does not have a Release file.
E: Failed to fetch https://download.docker.com/linux/debian/dists/stretch/edge/binary-amd64/Packages
E: Some index files failed to download. They have been ignored, or old ones used instead.

 

A more stable workaround is to force apt back into the old mode of not caring if the Release certs are perfectly matched to the file server:

cat <<'EOF' >>/etc/apt/apt.conf.d/01docker
Acquire::https::apt.dockerproject.org::Verify-Peer "false";
Acquire::https::download.docker.com::Verify-Peer "false";
EOF

 

Now, the install works fine:

curl -fsSL https://raw.githubusercontent.com/Ubiquiti-App/UNMS/master/install.sh > /tmp/unms_install.sh \
  && sudo bash /tmp/unms_install.sh
branch=master
version=0.10.3
Downloading installation package for version 0.10.3.
Setting VERSION=0.10.3
Download and install Docker
# Executing docker install script, commit: 490beaa
+ sh -c 'apt-get update -qq >/dev/null'
+ sh -c 'apt-get install -y -qq apt-transport-https ca-certificates curl software-properties-common >/dev/null'
+ sh -c 'curl -fsSL "https://download.docker.com/linux/debian/gpg" | apt-key add -qq - >/dev/null'
Warning: apt-key output should not be parsed (stdout is not a terminal)
+ sh -c 'echo "deb [arch=amd64] https://download.docker.com/linux/debian stretch edge" > /etc/apt/sources.list.d/docker.list'
+ '[' debian = debian ']'
+ '[' stretch = wheezy ']'
+ sh -c 'apt-get update -qq >/dev/null'
+ sh -c 'apt-get install -y -qq docker-ce >/dev/null'
+ sh -c 'docker version'
Client:
 Version:      17.09.0-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:42:09 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.0-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:40:48 2017
 OS/Arch:      linux/amd64
 Experimental: false
If you would like to use Docker as a non-root user, you should now consider
adding your user to the "docker" group with something like:

  sudo usermod -aG docker your-user

Remember that you will have to log out and back in for this to take effect!

WARNING: Adding a user to the "docker" group will grant the ability to run
         containers which can be used to obtain root privileges on the
         docker host.
         Refer to https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
         for more information.
Docker version: 17.09.0
./install-full.sh: line 470: ((: 17 < 1
        || (17 == 1 && 09: value too great for base (error token is "09")
Download and install Docker compose.
Docker Compose version: 1.9
Creating user unms.
Skipping 0.8.0 permission fix
Preparing templates
Creating docker-compose.yml
Pulling docker images.
Pulling redis (redis:3.2.8-alpine)...
3.2.8-alpine: Pulling from library/redis
cfc728c1c558: Pull complete
8eda5cfd7e0a: Pull complete
8acb752a319b: Pull complete
955021cea791: Pull complete
d301d906247c: Pull complete
ff438d9e11c6: Pull complete
Digest: sha256:262d8bd214e74cebb3a0573e0f3a042aa3ddade36cf39a4891dd1b05b636bc55
Status: Downloaded newer image for redis:3.2.8-alpine
Pulling postgres (postgres:9.6.1-alpine)...
9.6.1-alpine: Pulling from library/postgres
0a8490d0dfd3: Pull complete
b6475055d17e: Pull complete
ba55801edf3d: Pull complete
f132014bbab8: Pull complete
9775497ec4a5: Pull complete
678be380896e: Pull complete
31e4998cc9ec: Pull complete
Digest: sha256:fa48df82694141793fb0cd52b9a93a3618ba03e5814e11dbf0dd43797f4d4cf7
Status: Downloaded newer image for postgres:9.6.1-alpine
Pulling rabbitmq (rabbitmq:3)...
3: Pulling from library/rabbitmq
bc95e04b23c0: Pull complete
2e65f0b00e4c: Pull complete
f2bd80317989: Pull complete
7b05ca830283: Pull complete
0bb5a4bbcce5: Pull complete
cf840d8999f6: Pull complete
be339ca44883: Pull complete
ce35cd9f9b5b: Pull complete
a4fe32a0a00d: Pull complete
77408ca9e94e: Pull complete
db03407a1aba: Pull complete
Digest: sha256:9a0de56d27909c518f448314d430f8eda3ad479fc459d908ff8b281c4dfc1c00
Status: Downloaded newer image for rabbitmq:3
Pulling unms (ubnt/unms:0.10.3)...
0.10.3: Pulling from ubnt/unms
627beaf3eaaf: Pull complete
5fc32359ecb8: Pull complete
2b99ae07dd66: Pull complete
99c9d1420b38: Pull complete
b65b0ba413b8: Pull complete
86bd816c9566: Pull complete
32ebfd822bb4: Pull complete
Digest: sha256:5dc99a77ee8bb4d09f02da715ec3142283ce44d5e91b8f515b5694ffb25d6c3c
Status: Downloaded newer image for ubnt/unms:0.10.3
Checking available ports
Port 80 is already in use, please choose a different HTTP port for UNMS. [8080]:
Port 8080 is already in use, please choose a different HTTP port for UNMS. [8080]: 8888
Port 443 is already in use, please choose a different HTTPS port for UNMS. [8443]:
Port 8443 is already in use, please choose a different HTTPS port for UNMS. [8443]: 8883
Creating data volumes.
Will mount /home/unms/data
Creating docker-compose.yml
Deploying templates
Writing config file
no crontab for unms
no crontab for unms
Deleting obsolete firmwares...
Downloading new firmwares...
Downloading e50-1.9.7-hotfix.3.170831.tar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 74.6M  100 74.6M    0     0  5502k      0  0:00:13  0:00:13 --:--:-- 5870k
Downloading e100-1.9.7-hotfix.3.170831.tar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 80.8M  100 80.8M    0     0  5692k      0  0:00:14  0:00:14 --:--:-- 5859k
Downloading e200-1.9.7-hotfix.3.170831.tar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 80.7M  100 80.7M    0     0  5725k      0  0:00:14  0:00:14 --:--:-- 5873k
Downloading e1000-1.9.7-hotfix.3.170831.tar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 81.7M  100 81.7M    0     0  5705k      0  0:00:14  0:00:14 --:--:-- 5867k
Downloading e600-1.0.2.170728.tar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 86.8M  100 86.8M    0     0  5738k      0  0:00:15  0:00:15 --:--:-- 5871k
Downloading SFU-1.2.0.171003.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15.0M  100 15.0M    0     0  4663k      0  0:00:03  0:00:03 --:--:-- 4664k
Downloading XC-8.3.2.170901.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 9046k  100 9046k    0     0  5219k      0  0:00:01  0:00:01 --:--:-- 5216k
Downloading XC-8.3.2-cs.170901.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 9046k  100 9046k    0     0  5218k      0  0:00:01  0:00:01 --:--:-- 5219k
Downloading WA-8.3.2.170901.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 9028k  100 9028k    0     0  5327k      0  0:00:01  0:00:01 --:--:-- 5329k
Downloading WA-8.3.2-cs.170901.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 9028k  100 9028k    0     0  5006k      0  0:00:01  0:00:01 --:--:-- 5004k
Downloading TI-6.0.7.170908.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7083k  100 7083k    0     0  4917k      0  0:00:01  0:00:01 --:--:-- 4915k
Downloading TI-6.0.7-cs.170908.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7083k  100 7083k    0     0  5181k      0  0:00:01  0:00:01 --:--:-- 5185k
Downloading XM-6.0.7.170908.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7389k  100 7389k    0     0  5218k      0  0:00:01  0:00:01 --:--:-- 5222k
Downloading XM.6.0.7-cs.170908.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7389k  100 7389k    0     0  5963k      0  0:00:01  0:00:01 --:--:-- 5959k
Downloading XW.v6.0.7.170908.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7227k  100 7227k    0     0  5224k      0  0:00:01  0:00:01 --:--:-- 5225k
Downloading XW-6.0.7-cs.170908.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7227k  100 7227k    0     0  5075k      0  0:00:01  0:00:01 --:--:-- 5071k
Starting docker containers.
Creating network "unms_internal" with the default driver
Creating network "unms_public" with the default driver
Building fluentd
Step 1/6 : FROM fluent/fluentd:v0.12-latest
v0.12-latest: Pulling from fluent/fluentd
019300c8a437: Pull complete
d30279f73a02: Pull complete
fd39bd5a5dae: Pull complete
4dacb8d2bb26: Pull complete
963e933724db: Pull complete
8b4dd4e99009: Pull complete
59bedb222c2c: Pull complete
Digest: sha256:9b10ed70251fda1cd91c92f07a3ae74059adb1bdad6fc51cfcfe42272a9e78e8
Status: Downloaded newer image for fluent/fluentd:v0.12-latest
 ---> 4fce39752458
Step 2/6 : USER root
 ---> Running in 8f315349c16e
 ---> 84398611a0ad
Removing intermediate container 8f315349c16e
Step 3/6 : COPY entrypoint.sh /
 ---> 157af3140182
Step 4/6 : RUN apk add --no-cache --update su-exec     && apk add --no-cache dumb-init --repository http://dl-cdn.alpinelinux.org/alpine/edge/community/     && chmod +x /entrypoint.sh
 ---> Running in fbdef19d9e1a
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
OK: 27 MiB in 24 packages
fetch http://dl-cdn.alpinelinux.org/alpine/edge/community/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.5/community/x86_64/APKINDEX.tar.gz
OK: 27 MiB in 24 packages
 ---> e82e4e7e156f
Removing intermediate container fbdef19d9e1a
Step 5/6 : ENTRYPOINT /entrypoint.sh
 ---> Running in 3a0455e845ef
 ---> 7581bd63c44f
Removing intermediate container 3a0455e845ef
Step 6/6 : CMD fluentd -c /fluentd/etc/$FLUENTD_CONF -p /fluentd/plugins $FLUENTD_OPT
 ---> Running in 13c6baad173b
 ---> 97647e174228
Removing intermediate container 13c6baad173b
Successfully built 97647e174228
Successfully tagged unms_fluentd:latest
WARNING: Image for service fluentd was built because it did not already exist.
 To rebuild this image you must use `docker-compose build` or `docker-compose up --build`.
Creating unms-fluentd
Creating unms-redis
Creating unms-rabbitmq
Creating unms-postgres
Creating unms
Removing old images
Current image: ubnt/unms:0.10.3
All UNMS images: ubnt/unms:0.10.3
Images to remove: ''
No old images found
Waiting for UNMS to start
CONTAINER ID      IMAGE                   COMMAND                  CREATED             STATUS           PORTS                                            NAMES
6e814af4ffc5      ubnt/unms:0.10.3        "/usr/bin/dumb-ini..."   8 seconds ago       Up 3 seconds     0.0.0.0:8888->8080/tcp, 0.0.0.0:8883->8443/tcp   unms
01f61e7d9ae8      postgres:9.6.1-alpine   "/docker-entrypoin..."   10 seconds ago      Up 7 seconds                                                      unms-postgres
99261993de75      rabbitmq:3              "docker-entrypoint..."   10 seconds ago      Up 6 seconds                                                      unms-rabbitmq
21bb0d5db0e1      redis:3.2.8-alpine      "docker-entrypoint..."   10 seconds ago      Up 7 seconds                                                      unms-redis
cdb0b878b633      unms_fluentd            "/entrypoint.sh /b..."   11 seconds ago      Up 1 second      5140/tcp, 127.0.0.1:24224->24224/tcp             unms-fluentd
UNMS is running

Omnitech DP server lives

10 days ago, the drive enclosure for the TSM server failed during a storm. The enclosure is an RSV-S5 from 2010. The PSU died, and seems to be a specialty part. The part costs $250. A newer version of the enclosure $180 from Sans Digital. This is a bulk data server, so a 4-pay box was fine. I picked up a Mediasonic Probox 4-bay JBOD with ESATA and USB3 ports. It’s a faster port multiplier, better functionality, and half the volume on the server shelf.

I still plan to migrate everything to Linux on Spectrum Protect 8, with container pools, and maybe use glacier for off-site storage. This is compounded by CrashPlan ditching their non-business plans, and never being able to sync anyway. I really need a better way to store off-site DR data. BOX for a critical chunk is okay. Google and Dropbox for active data is okay. But for an off-site DR pool, it would be too expensive to put into either of those. Plus, SP8 is chunk aware much better. I’d hate for a CDP product to revert a chunk, or be constantly out of sync.


SATA chipset reference

The SIL3132 card (SATA-II, PCIe 1.0) ran at 122MB/sec.

The 88SE9128 card (SATA-III, PCIe 2.0) ran at 75MB/sec, or 35MB/sec with FIS disabled.

The 88SE9235 card runs at 195MB/sec.

My two test enclosures are:

  • SIL3726 based enclosure (RSV-5S)
  • 88SM9715 based enclosure (TR5M6G)
  • Linux, MDADM, RAID6, sequential read, 256k blocks.

Ableton said I should go with a single SSD behind a JMS575 port multiplier to get best performance out of the 88SE9128.

I pointed out that a single drive is not the same as multiple (switching delays),
and that replacing all of my spinning disks with SSD is not a valid solution.


Posted in News, Reference | Comments Off on SATA chipset reference

gallery upgrading

I’m finally updating the Gallery 1.5.10 server from 2004 to Gallery 3.0.9.
This fixes the PHP errors that kept showing up on the old version.
However, for right now, I can only log in with FireFox.

Anyway, 6851 photos, 160 albums, 16 users, 1535 comments getting imported.
When it’s done, we’ll see if everything looks okay before I swap it in place.
I honestly don’t think any on my users still use this.


Bad Subnet Kills DHCPD

One, single bad IP in DHCPD config will kill the entire config file. :(

On an EdgeRouter, and probably anything with Ubiquiti, and maybe anything using the same config style (Brocade and others have the same command set)….

If you add a static reservation outside of the DHCP server’s subnet,
as in, if you typo one octet, or decide to do another subnet just because,
your DHCP server will be offline after reboot. No errors, just silently not serving.

It can be outside of the start/stop range, and that’s fine.

Really, this should give you a warning from the webUI, or it should just say “OKAY, We’ll let you hand out stupid IP addresses.” I mean, what if I wanted this to be my DHCP server, but I had a different router and subnet on the same segment?

From command line, you’ll see the error though:

admin@gw1# commit
[ service dhcp-server ]
Static DHCP lease IP '192.169.1.79' under mapping 'CustomerLaptop'
under shared network name 'LAN' is outside of the DHCP lease network '192.168.1.0/24'.
DHCP server configuration commit aborted due to error(s).
[edit]

Compressed Dovecot Maildir on Debian

I just saved a few gigs with this. Figured I need to document this or I’ll never remember. :)

Add this into /etc/dovecot/conf.d/10*

# Enable zlib plugin globally for reading/writing:
mail_plugins = $mail_plugins zlib
# Enable these only if you want compression while saving:
plugin {
 zlib_save_level = 6 # 1..9; default is 6
 zlib_save = gz # or bz2, xz or lz4
}

Add this into /etc/dovecot/conf.d/20*

protocol imap {
  mail_plugins = zlib
}
protocol pop3 {
  mail_plugins = zlib
}

Remove extra spaces and leftover courier garbage

rename 's/\ /_/g' /home/jdavis/Maildir/.[a-zA-Z]*
rename 's/\__/_/g' /home/jdavis/Maildir/.[a-zA-Z]*
rename 's/\_\./\./g' /home/jdavis/Maildir/.[a-zA-Z]*
rm -r /home/jdavis/Maildir/courier*
rm -r /home/jdavis/Maildir/.[a-zA-Z]*/courier*

Create the script to compress all maildir files

#!/bin/sh
compress_maildir () {
cd $1
DIRS=`find -maxdepth 2 -type d -name cur`
for dir in $DIRS; do
       echo $dir
       cd $dir
       FILES=`find -type f -name "*,S=*" -not -regex ".*:2,.*Z.*"`
       #compress all files
       for FILE in $FILES; do
               NEWFILE=../tmp/${FILE}
               #echo bzip $FILE $NEWFILE
               if ! bzip2 -9 $FILE -c > $NEWFILE; then
                       echo compressing failed
                       exit -1;
               fi
               #reset mtime
               if ! touch -r $FILE $NEWFILE; then
                       echo setting time failed
                       exit -1
               fi
       done
       echo Locking $dir/..
       if PID=`/usr/lib/dovecot/maildirlock .. 120`; then
               #locking successfull, moving compressed files
               for FILE in $FILES; do
                       NEWFILE=../tmp/${FILE}
                       if [ -s $FILE ] && [ -s $NEWFILE ]; then
                               echo mv $FILE $NEWFILE
                               mv $FILE /tmp
                               mv $NEWFILE ${FILE}Z
                       else
                               echo mv failed
                               exit -1
                       fi
               done
               kill $PID
       else
               echo lock failed
               exit -1
       fi
       cd - >/dev/null
done
}

Actually RUN the script to compress all maildir files

./compress_maildir /home/jdavis/Maildir/

References


IMAP Email fixed

Courier-imap-ssl has been flaky for a long time, but now, it turns out it’s been very unhappy with current Thunderbird. Even after manually playing with the TLS settings, it was a beast.

So, I installed dovecot. One line for maildirs, one line for ssl enable, copy over my ssl keys, and set EXIM to use SASL instead of Courier. Poof. Everything *just works*.

While I was at it, I set Thunderbird sort and threading defaults (so I don’t have to set it on every folder individually), and that also is wonderful.

I don’t like having to subscribe to all of the folders manually (I have around 590 folders, one for each project, for each customer, for each partner, plus about 10 tech archives), but if I want it to save everything locally, I cannot just uncheck “show only subscribed folders” and expect it to work.

BUT, really, swapping over and doing all of the manual reconfig was way less time than trying to figure out why Courier was not working. (It was probably something to do with it not being updated any time in the last several updates I’ve tried.)

Now, I’d really like if TB would use an Outbox folder, rather than demanding SMTP, I could switch work over to IMAP vs Exquilla…


Posted in News | Comments Off on IMAP Email fixed

Apache 2.4 on Debian

ns1 got converted to 64-bit, and upgraded to Jessie. It’s been a little painful, but worked for the most part.

1) The biggest thing was installing core packages with :amd64 such that we were never left without dpkg nor apt.

2) PERL broke horribly, and that’s why we moved to Jessie — the only way to get it to REALLY reinstall/rebuild CPAN.

3) A couple of days later, 2 more Seagate drives threw a media chip, and racked up 3k-4k reallocated sectors over a couple of days. Not only have Seagate drives failed extremely rapidly under controlled power and temperatures (some of these were replacements of failed original drives), now, the warranty page on Seagate’s website gives a 404 error.

The drives were replaced with WD RED drives, which have been very stable in this environment. RAID6 ensured that at no point did we lose access to data, nor suffer any losses.

4) Apache 2.4 has changed a whole bunch. In 2.2, there were transitional packages off of the base names, and in 2.4, the transitional packages moved it back. Whatever. *sigh* That’s cleaned up, but was no real factor.

These changes in Apache 2.4 have been resolved:

  • conf.d is no longer used
  • sites-enabled/* must have “.conf” appended
  • mod_auth_pam is no longer available
  • “Require user” is now requires “pwauth” and “libapache2-mod-authnz-external” and new directives
  • Require group” is now replaced with “Require unix-group”, and requires “libapache2-mod-authz-unixgroup”, which is different from user authentication.

I’m still working on one of my aliased directories which is not working.

I’m also trying to sort out why Tine 2.0 is stuck “upgrading”. That’s normal for Tine 2.0 though. It’s really an annoying beast, and I’m glad I don’t rely on it. I really just want CalDAV, IMAP, and maybe something to sync notes and reminders. It seems this is nearly impossible.


Copyrights

Things I learned today:

  • If you are American, then it’s okay to go after you for 33 seconds of some song in the background of a video from a friend’s party.
  • If you are not American, it’s okay to post the whole song online.
  • Italy does not have “Fair Use” in their copyright laws, and everything is licensed, even blank recording media.
  • YouTube does not have a way for individuals to restrict their videos to specific country’s viewers without joining a syndication network and signing distribution agreements.

Failing drive in the array

I collected info from the failing drive in the array, and compared to other drives in the array.
It actually looked good, comparatively, until I found this one line near the end:
Warning: device does not support SCT Error Recovery Control command
GAH. No wonder. I still have to replace it.

This is another reason why Seagate is on my poop list.
They sent me a lower function device as a warranty replacement.

On the flip side, if anyone needs a 2TB 5900RPM drive for a desktop system, I can hoox you up.
It’s still in good condition, just not suitable for an array.
Also, it’s out of warranty (but only a year old).


Posted in News | Tagged , | Comments Off on Failing drive in the array

RAID maintenance

My RAID drive that went offline last week went offline again yesterday. That means real failure.

It’s a warranty replacement of a previously failed drive. Out of the Seagate drives I’ve used in arrays, I’ve had 1 drive not fail in 4 years, and I’ve had more failures overall than actual drives.

This is at three different sites, four arrays, different enclosures, systems, etc. Everything on UPS and surge suppression.

My WD RED drives are happy. At 1 year, no failures out of 8 drives. Though, one array was going offline due to a flaky controller. No problem with the drives though.

This array was populated in 2010, so the warranty is up. I’m replacing this drive with a WD Red 3TB. I’m only replacing one drive at this time, just due to budget, but Linux MD-RAID will happily let me mix in this newer, higher capacity drive.

We’re running RAID6, so there shouldn’t be any interruptions.

  • knock on wood*

New UPS batteries

The storm last night performed a UPS test that was long over-due.

Unfortunately, the server UPS failed. One battery was 0V, and the other was 8.6V.

Both 7AH-12V batteries were replaced with new 9AH-12V batteries.

The old ones were the factory batteries, plut into service 2008-07-03 as per:
http://omnitech.net/news/2008/07/03/upgraded-ups/

The UPS seems happy with the new batteries. They should provide a little longer run-time.

I’ll add a reminder to replace them in 5 years.


Freedom Pop

Freedompop has 2 bars but cannot ping the gateway. Sigh

powerfail at the home office has pointed out some UPS defgiciencies. FiOS only provides voice service on UPS power. Also, our 2008 server hangs on UPS power (probably need sine wave). Lastly, the cordless base is not on a UPS.

Everything else was hibernated or shutdown safely.


Fixed URLs in WP posts

My WPMediawiki plugin was converting anchor tags into nested anchor tags, which was failing horribly.

Also, I installed a redirect plugin which should allow me to use the wiki markup tag with a subset and still get the right plage.

For instance, Sprouts should redirect to Sprouts-6224 about tasty toffee peanuts, and Khai Ranks should link to a page about Khai ranking up in Karate.


Posted in News, xaminmo | Comments Off on Fixed URLs in WP posts

Optiplex 755

NS1 has been upgraded to an Optiplex 755 Core2-Duo 3GHz.
It has 5x SATA ports on the motherboard, but does not support Port Multipliers properly.
It sees the drives, but there are hangs, lags, etc all the time. 6MB/sec aggregate isn’t okay.

I didn’t want to use my PCI-32 SIL-3124, mostly for performance reasons.
I have a 1x SIL3132, and found the drive enclosure works fine on port 0, but LILO gives L 01 01 01 01 01 etc if I use port 1.
This is pulling 102MB/sec sequential from a 5-way mirror, and 124MB/sec from a 5-way RAID6.
Compare to the 3124 which topped out about 50MB/sec regardless of which slice I abused.

In theory I could just cable all of the drives up to individual ports, and the performance would likely be slightly better.
Unfortunately, I’d need to pull the optical drive, and install a 2-3 bay converter.
I didn’t want to spend MORE money, because if I did, I’d probably just start buying 2.5″ drives, or other things I didn’t need.
1 2 3 4 5 6
7 8 (plus cables, brackets, etc)

Anyway, I’ve bumped up to a 64-bit kernel, and may be swinging to a 64-bit OS if I feel particularly frisky.
http://www.v13.gr/blog/?p=11http://wiki.debian.org/Migrate32To64Bit

My goal would be to install SDFS (open dedupe), mhVTL (Linux VTL, now with iSCSI support) and see how they’re doing vs 2007.
http://code.google.com/p/opendedup/http://opendedup.org/
https://sites.google.com/site/linuxvtl2/http://stgt.sourceforge.net/


Omnitech News Service

According to a U.S. District Court Judge in Oregon, Honorable Marco A. Hernandez, `press`, in context of the First Amendment to the US Constitution, specifically means persons `affiliated with any newspaper, magazine, periodical, book, pamphlet, news service, wire service, news or feature syndicate, broadcast station or network, or cable television system.` Online journalism is now legally excluded unless affiliated with an entity of one or more of those types.

As such, be it known that all of my communication via any online forum is formally a media for communication to the public as an agent of the `OmniTech News Service`. This service will include fact, fiction, assumptions, satire, and any other form of communication deemed appropriate by the staff.

Ref: http://blogs.seattleweekly.com/dailyweekly/2011/12/crystal_cox_oregon_blogger_isn.php
Ref: http://mashable.com/2011/12/07/blogger-vs-journalist/
Ref: http://www.examiner.com/business-news-in-syracuse/judge-hits-blogger-with-2-5-million-fine-for-not-being-a-journalist


Posted in News | Comments Off on Omnitech News Service