AIX 7.2.3.1 breaks GSKit 8.0.50.89

AIX 7.2.3 breaks GSKit8, up through GP29 (8.0.50.89).

This affects TSP/Spectrum Protect, Content Manager, Tivoli Directory Server, Websphere, DB2, Informix, IBM HTTP Server, etc.

Before reboot, everything works still, which implies the change is in the kernel.

We found it on TSM, and AIX 7200-03-01-1838, and Spectrum Protect server 8.1.6.0.

Application crash and DBX follow below.

ANR7800I DSMSERV generated at 12:17:13 on Sep 11 2018.
IBM Spectrum Protect for AIX
Version 8, Release 1, Level 6.000
Licensed Materials - Property of IBM
(C) Copyright IBM Corporation 1990, 2018.
All rights reserved.
U.S. Government Users Restricted Rights - Use, duplication or disclosure
restricted by GSA ADP Schedule Contract with IBM Corporation.

ANR7801I Subsystem process ID is 10944920.
ANR0900I Processing options file /home/tsminst1/dsmserv.opt.
ANR7811I Using instance directory /home/tsminst1.
Illegal instruction(coredump)

# dbx /opt/tivoli/tsm/server/bin/dsmserv core.10944896.28165312
Type 'help' for help.
[using memory image in core.10944896.28165312]
reading symbolic information ...warning: no source compiled with -g

Illegal instruction (illegal opcode) in . at 0x0 ($t1)
warning: Unable to access address 0x0 from core

(dbx) where
.() at 0x0
gsk_src_create__FPPvPv(??, ??) at 0x9000000015b6d88
__ct__8GSKMutexFv(??) at 0x9000000018d664c
__ct__20GSKPasswordEncryptorFv(??) at 0x9000000018cb248
__ct__7gsk_envFv(??) at 0x900000000aaa6b0
GskEnvironmentOpen__FPPvb(??, ??) at 0x900000000ab14c4
gsk_environment_open(??) at 0x900000000ab277c
IPRA.$CheckGSKVersion() at 0x100eecf68
tlsInit() at 0x100eecd70
main(??, ??) at 0x10000112c

(dbx) th
thread state-k wchan state-u k-tid mode held scope function

$t1 run running 41877977 k no sys
$t2 run blocked 21234465 u no sys _cond_wait_global
$t3 run running 24380103 u no sys waitpid


dsmserv fails to start if LDAP is inaccessible

IBM, and the white books, say this is working as designed.
• If LDAP dies, dsmserv stays up without it.
• If LDAP dies, and dsmserv restarts, it refuses to come up.
• ANR3103E https://www.ibm.com/support/knowledgecenter/SSEQVQ_8.1.5/srv.msgs/b_msgs_server.pdf
• Workaround is remove LDAPURL from dsmserv.opt, or wait for LDAP to become accessible.

On a multi-homed server, any links serving LDAP should be fault tolerant.

Any server using LDAP should have fault tolerant LDAP servers.

Here’s where to vote on getting the start-up limitation changed.
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=121985


ANR3114E LDAP error 81. Failure to connect to the LDAP server

This used to be on IBM’s website, but it disappeared.  It is referenced all over the net, and needed to still exist. I only found it in the wayback machine, so I’m adding another copy to the internet.

2013 SOURCE: www-01.ibm.com/support/docview.wss?uid=swg21656339

Problem(Abstract)

When the SET LDAPUSER command is used, the connection can fail with:

ANR3114E LDAP error 81 (Can’t contact LDAP server)

Cause

The user common name (CN) in the SET LDAPUSER command contains a space or the ldapurl option is incorrectly specified.

Diagnosing the problem

Collect a trace of the Tivoli Storage Manager Server using the following trace classes:
session verbdetail ldap ldapcache unicode

More information about tracing the server can be found here: Enabling a trace for the server or storage agent

The following errors are reported within the trace:

11:02:04.127 [44][output.c][7531][PutConsoleMsg]:ANR2017I Administrator ADMIN issued command: SET LDAPPASSWORD ?***? ~
11:02:04.171 [44][ldapintr.c][548][ldapInit]:Entry: ldapUserNew =      CN=tsm user,OU=TSM,DC=ds,DC=example,DC=com
11:02:04.173 [44][ldapintr.c][5851][LdapHandleErrorEx]:Entry: LdapOpenSession(ldapintr.c:2340) ldapFunc = ldap_start_tls_s_np, ldapRc = 81, ld = 0000000001B0CAB0
11:02:04.174 [44][ldapintr.c][5867][LdapHandleErrorEx]:ldap_start_tls_s_np returned LDAP code 81(Can't contact LDAP server), LDAP Server message ((null)), and possible GSKIT SSL/TLS error 0(Success)
11:02:04.174 [44][output.c][7531][PutConsoleMsg]:ANR3114E LDAP error 81 (Can't contact LDAP server) occurred during ldap_start_tls_s_np.~
11:02:04.174 [44][ldapintr.c][6079][LdapHandleErrorEx]:Exit: rc = 2339, LdapOpenSession(ldapintr.c:2340), ldapFunc = ldap_start_tls_s_np, ldapRc = 81, ld = 0000000001B0CAB0
11:02:04.174 [44][ldapintr.c][1580][ldapCloseSession]:Entry: sessP = 0000000009B99CD0
11:02:04.175 [44][ldapintr.c][3159][LdapFreeSess]:Entry: sessP = 0000000009B99CD0
11:02:04.175 [44][ldapintr.c][2449][LdapOpenSession]:Exit: rc = 2339, ldapHandleP = 000000000AFDE740, bindDn =                              (CN=tsm user,OU=TSM,DC=ds,DC=example,DC=com)
11:02:04.175 [44][output.c][7531][PutConsoleMsg]:ANR3103E Failure occurred while initializing LDAP directory services.~
11:02:04.175 [44][ldapintr.c][856][ldapInit]:Exit: rc = 2339
11:02:04.175 [44][output.c][7531][PutConsoleMsg]:ANR2732E Unable to communicate with the external LDAP directory server.~

Resolving the problem

  • In the trace provided, the common name (CN) contains a space. (CN=tsm user,OU=TSM,DC=ds,DC=example,DC=com)

    Remove the space in the common name when using the SET LDAPUSER command. For example:

    SET LDAPUSER “CN=tsmuser,OU=TSM,DC=ds,DC=example,DC=com”

  • Use an LDAP connection utility such as ldp.exe to ensure the ldapurl option is correct and the LDAP server is accepting connections

    <ldapurl> port 636, check the box for SSL

    Verify there are no errors in the output


Protect initial install

This is happiness…

tsminst1@tsm:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial

/bin/bash# for i in /dev/sd? ; do smartctl -a $i ; done | grep ‘Device Model’
Device Model: Samsung SSD 850 EVO 250GB
Device Model: WDC WD30EFRX-68EUZN0
Device Model: Samsung SSD 850 EVO 250GB
Device Model: WDC WD30EFRX-68EUZN0
Device Model: WDC WD30EFRX-68EUZN0

tsminst1@tsm:~$ dsmserv format dbdir=/tsm/db01,/tsm/db02,/tsm/db03,/tsm/db04,/tsm/db05,/tsm/db06,/tsm/db07,/tsm/db08 \
> activelogsize=8192 activelogdirectory=/tsm/log archlogdirectory=/tsm/logarch

ANR7800I DSMSERV generated at 11:32:48 on Sep 19 2017.

IBM Spectrum Protect for Linux/x86_64
Version 8, Release 1, Level 3.000

Licensed Materials – Property of IBM

(C) Copyright IBM Corporation 1990, 2017.
All rights reserved.
U.S. Government Users Restricted Rights – Use, duplication or disclosure
restricted by GSA ADP Schedule Contract with IBM Corporation.

ANR7801I Subsystem process ID is 29286.
ANR0900I Processing options file /home/tsminst1/dsmserv.opt.
ANR0010W Unable to open message catalog for language en_US.UTF-8. The default language message catalog will be used.
ANR7814I Using instance directory /home/tsminst1.
ANR3339I Default Label in key data base is TSM Server SelfSigned SHA Key.
ANR4726I The ICC support module has been loaded.
ANR0152I Database manager successfully started.
ANR2976I Offline DB backup for database TSMDB1 started.
ANR2974I Offline DB backup for database TSMDB1 completed successfully.
ANR0992I Server’s database formatting complete.
ANR0369I Stopping the database manager because of a server shutdown.


New data protection

Upgrading TSM server from Q9650 Core 2 Quad 3.0GHz, 8GB DDR2 on Win 2008R2.

New system is HP Z600, two-socket, 6-core 2.66GHz Xeon X5650 and 48GB of RAM. Wattage is the same per socket, but two sockets now. 3x the cores, 4x the performance.

SSDs for DB and Log are also moving to EVO 850 from Corsair M100. I’ll set up a container pool to replace the dedupe file class, and put that on 3x 3TB RAID5 instead of 2x RAID1.

OS will be Ubuntu 16.04.2 LTS. I’d like to just use Debian 9.1, but Debian and long-term-support seem to not be synonymous. I’d hate to run a patch update and have everything break, then fight with debian testing repo to try to get it all back to normal. Plus, I have no Ubuntu boxes, only Debian. It’ll give me a chance to see what operational differences I run into.

Old TSM is 6.4. New will be “Spectrum Protect” 8.1.3. Yes, the billions spent to rebrand to the same name as Charter Cable’s rebrand really seems like money well spent.

Anyway, Since I lost the offsite replication provider for the dedupe file pool, and it was having trouble keeping up anyway, this will let me change to server-side encryption, and object storage. We’ll see which provider wins out on price once everything is rededuped properly.

If the fan noise is not too bad, maybe this platform can be considered for a low-cost upgrade to the kids’ game machines. Though, these are heavy, with 2 big handles on the top.

Also, really, something new enough to have USB3 on the motherboard is probably better. I have some laptops picked out, but that’s re-buying every component, including ones that are presently decent. *sigh*


IBM Download Director is a beast

I’m sure this will all change in a week, but until then, here is reference for how to uninstall download director, or forcibly reinstall it.

There was no support, and no google help, no IBM search help, etc. ​After all the usual things, I went to a system without an existing DD installation.

​You can force-reinstall Download Director from here:
https://www-03.ibm.com/isc/esd/dswdown/dldirector/installation_en.html

​You can manually run DD here, but I don’t know how to feed it packages:
https://www14.software.ibm.com/dldirector/IBMDownloadDirectorApp.jnlp

​There is info on how to uninstall DD here:
​https://www-03.ibm.com/isc/esd/dswdown/dldirector/uninstall_en.html

​I’m sure these URLs will change in the next forced web redesign, but for now, this should help for people with broken DD installs.

Reinstall info is obscured in convoluted JavaScript, but here’s the uninstall information:

Windows
How to uninstall

  • Open a new cmd window, paste the following command and hit enter:
  • reg DELETE HKCU\Software\Classes\ibmddp /f && rmdir %HOMEPATH%\AppData\Local\IBM\DD /S /Q
  • You should see a “The operation completed successfully.” message.

How to verify if Download Director is installed

  • Open a new cmd window, paste the following command and hit enter:
  • (reg query HKCU\Software\Classes\ibmddp 1> NUL 2>&1 && IF EXIST %HOMEPATH%\AppData\Local\IBM\DD\DownloadDirectorLauncher.exe (echo DD Installed) else (echo DD not installed)) || echo DD not installed
  • You should see either “DD installed” or “DD not installed”.

Linux
How to uninstall

  • Open a new terminal window, paste the following command and hit enter:
  • xdg-mime uninstall ~/.local/share/applications/ibm-downloaddirector.desktop && rm -rf ~/.local/share/applications/ibm-downloaddirector.desktop ~/.config/download-director/
  • If no errors are displayed, the operation completed successfully.

How to verify if Download Director is installed

  • Open a new terminal window, paste the following command and hit enter:
  • [[ -f ~/.local/share/applications/ibm-downloaddirector.desktop || -f ~/.config/download-director/DownloadDirectorLauncher.sh ]] && echo "DD installed" || echo "DD not installed"
  • You should see either “DD installed” or “DD not installed”.

Mac
How to uninstall

  • Open the “Terminal” app, paste the following command and hit enter:
  • rm -rf ~/Applications/DownloadDirectorLauncher.app/
  • If no errors are displayed, the operation completed successfully.

How to verify if Download Director is installed

  • Open the “Terminal” app, paste the following command and hit enter:
  • [[ -d ~/Applications/DownloadDirectorLauncher.app/ ]] && echo "DD installed" || echo "DD not installed"
  • You should see either “DD installed” or “DD not installed”.

Why I wrote this up:
I find myself stuck with IBM due to the value of legacy skills vs transitioning to newer skills.
Periodically, IBM makes changes to their webpage, or code download system.
Often, these leave things inconsistent (claims that HTTP can be used, but it’s no longer available).
Worse, forced tools will stop working, and the IBM solution is to wipe your entire browser config and start over.

IBM has decided it’s better to force people to use Download Director instead of any standard protocol.
IBM’s mantra is “It worked for me in the lab, so if it doesn’t work for you, tough patooties.”
There is no escalation to people who make decisions. This has been an ongoing issue for a decade.
No one cares, except a few of the ubertechs supporting things, but they have no sway.

I’ve been using HTTP for a while, but they pulled that, so I had to use DD.
This time, DD gave me an error that JavaWS could not be started.
So I uninstalled all Java, reinstalled the newest, and DD said I had no Java installed.

There were no google hits to help, no IBM pages to help, and IBM search is useless as always.
Of the pages I found, none of them had contact forms, because that costs money.
There is no uninstall tool for Download Director.
There is no Browser Extension, no OS uninstall tool.
Removing the AppData folder does not help.

I went to a clean system, and wrote down all that I could find during a new code download attempt.
There is actually a webpage for this, but it is not indexed anywhere. That’s linked above.
That’s what this post is about.

Note that this is not acceptable in any way, and is one of the many reasons people are leaving IBM for open standards.
It’s not about “The Cloud”. It’s about IBM having so many layers between the decision-makers and the workers that they are out of touch. They have no idea how to be a tech business anymore, and are run by people who are content to gut the reputation of IBM so as to report a short-term improvement in gross profit. Zero interest in the long term.


AIX and PowerHA versions 2017-06

This changes periodically, but for today, here is what I would do.

My PowerHA selection process would be:
* 7.1.3 SP06 if I needed to deploy quickly, because I have build docs for that.
* 7.1.4 doesn’t exist, but if it came out before deployment, I would consider it. Whichever was a newer release, latest 7.1.3 SP, or latest 7.1.4 SP.
* 7.2.0 SP03 if they wanted longer support, but had time for me to work up the new procedures during the install.
* 7.2.1 SP01 if SP01 came out before I deployed, and had chosen 7.2.0 prior. 7.2.1.0 base is available, but that’s from Dec 2016, and 7.2.0.3 is from May 2017. Newer by date is better.

My AIX selection process would be:
* Any NIM server would be AIX 7.2, latest TLSP.
* Any application support limits would win down to AIX 6.1, plus latest TLSP.
* For POWER9, I would push 7.2, latest TLSP.
* For POWER8, I would push 7.1 or later. — latest TLSP
* For POWER7, I would push 6.1 or later. — latest TLSP
* For POWER6 or older, or AIX 5.3 or older, I would push strongly against due to support and parts limitations.

Code sources:
* I would make sure to install yum from ezinstall, and deploy GNU tar and rsync:
http://public.dhe.ibm.com/aix/freeSoftware/aixtoolbox/ezinstall/ppc/
* I would update openssh from the IBM Web Download expansion:
https://www-01.ibm.com/marketing/iwm/iwm/web/reg/pick.do?source=aixbp&lang=en_US
* If any exposure to the public net, or a high-sensitivity system, I would check AIX security patches also.
http://public.dhe.ibm.com/aix/efixes/security/?C=M;O=D
ftp://ftp.software.ibm.com/aix/efixes/security/
* I would get the latest service pack for both AIX and PowerHA from Fix Central:
https://www-945.ibm.com/support/fixcentral/
* Base media, if I were certain the customer was entitled, but didn’t want to wait for them to provide media, Partnerworld SWAC:
https://www-304.ibm.com/partnerworld/partnertools/eorderweb/ordersw.do

Reference: PowerHA to AIX Support Matrix:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD101347


PPC64 Linux on Intel

QEMU on Windows will run ppc64 and ppc64le emulation.
It emulates the same as what PowerKVM on an S812L would provide.
It’s kind of slow because there is no KVM module, AND Intel vs PPC,
AND emulator mode is single-core/proc/thread.

You can get Windows installer here:
https://qemu.weilnetz.de/

You really want ANSI/VT100 escape codes on you “cmd.exe” also:
https://github.com/adoxa/ansicon

To build a blank disk:
qemu-img create -f qcow2 qemu-disk-ppc64.img 32G

You can boot with this:
set SDL_STDIO_REDIRECT=NO
qemu-system-ppc64 -M type=pseries -m 1G,slots=4,maxmem=8G
-cpu POWER8E -smp 1 -vga none -nographic
-netdev user,id=net0 -device spapr-vlan,netdev=net0
-device spapr-vscsi -device scsi-hd,drive=drive0
-drive id=drive0,if=none,file=qemu-disk-ppc64.img
-cdrom D:\Downloads\debian-testing-ppc64el-DVD-1.iso

The QEMU part is all one line. The cdrom image is up to you. I like Debian.

Other Notes:
Any issues with cursor keys, use ctrl-i for TAB, ctrl-n and ctrl-p for next/previous.

Emulation mode is flaky with more than one core.

There is a QEMU AIX build on PERZL.ORG which would be faster, especially for ppc64 BigEndian.

PowerKVM is just PPC Linux, QEMU, KVM, and LIBVIRT. KVM is just a kernel module for spee-dup. LIMVIRT is just a GUI and CLI tool to build VM definitions. QEMU is the emulator. Works best on POWER8, with hypervisor disabled (OPAL mode).

QEMU still does not have enough RTAS and NVRAM to boot AIX. AIX hangs during “Starting AIX”, and Diags just says it’s an unsupported machine type. There is a little bit of dev for this, but not much.​


LED 0088 on NIM install & Upgrade

RE: LED 0088 on NIM install & Upgrade
When performing a NIM restore + upgrade at the same time, you have to run no-prompt.
You do this my editing the bosinst.data file from the mksysb,
then you define and allocate that as a NIM resource.

One of the things you have to do is provide the target disks.
If the target disks do not exist, such as only one LUN when you thought there were two, the LPAR will HANG with LED 0088

You might see something like this on the console:
Cannot run a 64-bit program until the 64-bit
environment has been configured. See the system administrator.
eval /usr/lpp/bosinst/bi_io -c < /dev/console Erasing Disks Please wait... Approximate Elapsed time % tasks complete (in minutes) 0042-008 NIMstate: Request denied - Method_req Or you might see
MnM_Restore_NIM_Required_Files…

but nothing more. It will just hang there indefinitely.

Give it 30 mins before you call it hung, but you might want to do an RTE install so you can go to the menus and verify you have the right disks listed.