Replacing TSM / ISP server

I ran into an issue where a primary TSM 7.1.8 server was broken, and it was easier to just move all of the clients over to the secondary 8.1.5 server. These used the new TLS encryption, and I kept running into issues.


First, I physically shut down the old server, and updated the new server to use the IP as an alias by editing /etc/network/interfaces. (Ubuntu 16 LTS)


Various errors included:
ANR8599W The connection with host address:host port failed due to an untrusted server certificate. An attempt to reconnect and establish certificate trust might follow.

ANR2284S The server master encryption key has changed. Passwords protected with the previous master encryption key are not available.

I had to make the server trust itself again with:[LINK]
dsmcert -add -server TSM -file /home/tsminst1/cert256.arm


This error:
ANR0456W Session rejected for server DT – the server name at, 1500 does not match.

I removed the server:


These errors:
ANR1651E Server information for DT is not available.
ANR4377E Session failure, target server DT is not defined on the source server.
ANR3151E Configuration refresh failed with configuration manager DT.

I disabled replication config


These errors:
ANS1695E The certificate is not valid.
ANS1592E Failed to initialize SSL protocol.
ANS8023E Unable to establish session with server.
ANR3335W Unable to distribute certificate to for session .
ANR8599W The connection with failed due to an untrusted server certificate. An attempt to reconnect and establish certificate trust might follow.

From the clients, I had to remove /opt/tivoli/tsm/client/ba/bin/dsmcert.* and C:\Program Files\Tivoli\TSM\baclient\dsmcert.* on the clients per

However, I also had to fix dsmcert / dsmcert.exe which I clobbered in the process.


And on the server, allow keys to be swapped again:[LINK]

Swap keys on client:


NOTE: I also removed /etc/adsm/* during troubleshooting, but that was not needed. That just lead to me having to re-enter passwords again. Simply deleting the cert database corrected the problem on other clients.


I tried to order this in dependency order. I was sort of all over the place when I did it, and might have missed something. I just could not exactly get all of the right info from any one documentation source.


AIX geeks, converting from SDDPCM to AIXPCM, when you uninstall the drivers, you also need to uninstall host attachment. On 2% of our migrations, mksysb/alt clones would fail to find the boot disk (554). devices.sddpcm.*

Note that from SVC 7.6.1, AIX 6.1.8, AIX 7.1.3, and AIX 7.2, you MAY switch to AIXPCM. On POWER9 and later, you MUST switch to AIXPCM.

The rm script is from storage development, but the manage_disk_drivers command is from AIX dev. Either is okay, but tge AIX one does not require making a PMR.

Best reference:

AIX types of ethernet interfaces

AIX shows a lot of different info in different places.  This is because AIX predates the time when everyone had RJ45 ethernet ports.

HBA represents a high-function PCI adapter that contains multiple protocols, and which can sometimes be configured to provide ENT devices.  Primary candidates are “Integrated Virtual Ethernet” on POWER5 and POWER6 servers, as well as ROCE adapters, which are “RDMA Over Converged Ethernet”, with RDMA being “Remote Direct Memory Addressing” or “Access”.  Basically, Infiniband adapters which can use ethernet at the link layer.

ENT represents the “physical port”, though that is not always the case.  I’ll explain more later.  There is one one of these for every Ethernet port visible to the operating system.

EN represents the “ETHERNET II” protocol device for IP communication.  This is the standard today, also known as “DIX Etehrnet”, named after DEC, Intel, Xerox.  This is where you will normally put your IP address.  There is one of these for every ENT device.

ET represents an IEEE 802.3 protocol device.  This would have been used in the days of Novel Networking, or with SNA protocol.  Almost no-one uses this anymore, but I’m sure there’s an AIX 3.2.5U2 microchannel server running with this somewhere in the bottom of an old government facility, with coaxial cables and barrell terminators.  Really, I don’t know why this still is needed on anything produced in the last 20 years.  There is one of these for every ENT device.

INET is for config options that affect the entire TCP/IP stack, such as persistent routes, the hostname, and whether you are bypassing ODM for config of your network (rare).  There is only one of these per system, and it is always inet0 unless someone gets cheeky.

There are other ways to get IP devices, such as IP over Fibre Channel, IP over Infiniband, IP over ATM, over FDDI, over serial or parallel, etc.  These are less common, so I’m not going into them here.

Generally, you may have a stack like this:

ent0    physical ethernet port
ent1    physical ethernet port
ent2    Etherchannel (Static, or LACP bond created out of both of the above)
ent3    Virtual Ethernet (Connects to a virtual, firmware-only switch)
ent4    Shared Ethernet (VIO server only, a software bridge between a virtual physical)
ent5    VLAN (an additional VLAN port configured off of any of the above)
en0     IP interface – unused because we give ENT0 as a backing device to ENT3
en1     IP interface – also unused for the same reason
en2     IP Interface – also unused, because this is the backing device for the SEA
en3     IP interface – Also unused because this is a backing device for the SEA.
en4     IP interface hanging off of ENT4 – this can be skipped, and a virtual ethernet used
en5     IP interface hanging off of ENT5 – this can be skipped, and a virtual ethernet used

Each device has its own type of parameters.  You can use “lsattr -El $device”, “netstat -in”, and “entstat -d $device” to get details of this.  Note that entstat wants to be on the top device, not the bottom device.  Start with where the IP address is assigned, and it will show the subdevices, virtual connections, etc.

High Level VIO/Client build

This is off the cuff, and is not a technical walkthrough. This is enough for you to teach yourself assuming you have a system to hack on.

IBM’s POWER8 docs are missing almost everything. I don’t understand how they can call them docs at all. They want you to use some really picky tools that are cumbersome and not flexible in all the right ways.

The IBM POWER7 docs are close, but are missing the SR-IOV info. Your best bet is to skim though this, and stop when you find the bits you want (concepts, config):

The high level jist of building a VIO environment is as follows:

  • Configure to HMC
  • Clear managed system profile data
  • Build a couple VIO servers:
    • 6GB RAM, 3 virtual procs, 0.3 virtual CPUs, 255 CPU weight
    • At least one storage and one network adapter
    • You can use SR-IOV to share an ethernet adapter from firmware if needed
    • One virtual ethernet trunk for each separate physical network.  Assign VLANs here
    • One virtual ethernet non-trunk for each VLAN you want an IP address on (ideal, but you can also hang IPs and VLANs directly from AIX)
    • One virtual SCSI server adapter for each client LPAR that will need virtual CDROM, Virtual Tape, or legacy Virtual SCSI disk (higher CPU load).
    • One virtual fibre adapter for each client port (usually two per client on each VIO server, but can be anywhere from 18)
  • Upload the VIO base media into the HMC media repository
  • Install the VIO server from the HMC
  • SSH into the HMC, and use vtmenu to rebuild the VIO networking
    • Remove all en, et, ent, hba devices, then cfgmgr
    • mkvdev -lnagg for any etherchannel bonded pairs needed for the Shared Ethernet Adapter(s)
    • mkvdev -sea  to build any shared ethernet adapters (ethernet bridge from virtual switch to physical port)
    • mkvdev -lnagg for any etherchannel bonded pairs needed for local IP communication
    • mkvdev -vlan for any additional VLANs hanging directly off an SEA rather than through a virtual ethernet client adapter
    • mktcpip to configure your primary interface, gateway, etc
    • Add any extra IP addresses.
  • Build your Client LPARs
    • Memory, CPU, RAM as desired
    • Virtual ethernet just picks the switch and VLAN that you need.  If this does not exist on any VIO trunk adapters, then you need to fix that.
    • Virtual SCSI client adapter
      • this needs the VIO server partition ID, and the VIO server slot number added to it for the firmware connection.
      • The VIO server virtual SCSI adapter needs the same mapping back to the client LPAR id and slot.
      • There may be some GUI improvements to add this all for you, but it’s been decades of garbage for so long that I just do it all manually.
    • Virtual Fibre adapter – This maps back and forth to the VIO server virtual fibre similar to how VSCSI did.
  • SSH into the VIO server
    • make virtual optical devices attached to the “vhost” (virtual SCSI” if needed
    • Use vfcmap to map the “vfchost” adapters to real “fcs” ports.  This requires them to be NPIV capable (8gbit or newer), logged into an NPIV capable switch (lsnports).
  • Zone any LUNs
    • lsnportlogin can give you the WWNs for the clients, or you can get it from the client profile data manually
    • You can use OpenFirmware’s “ioinfo” to light up a port to force it to log in to the switch.
    • If the LPAR is down, you can use “chnportlogin” from the HMC to log in all ports for that client.
    • You can also zone directly to the VIO server, and “mkvdev” to map them as vscsi disks (higher CPU load on VIO server, and kind of a pain in the rump).
    • Note that LPM requires any VSCSI LUNs to be mapped to all VIO servers in advance.
    • Note that LPM requires any NPIV LUNs to be mapped to the secondary WWNs in advance
  • SSH into the VIO server
    • Make sure lsmap and lsmap -npiv show whatever mapping is required
    • Make sure loadopt has mounted any ISO images as virtual CDROMs if needed
    • You can also just mask an alt_disk_install LUN from a source host.
    • You can also use NIM to do a network install
  • Activate the LPAR profile.
    • If you did not open a vterm from SSH into the HMC, then you can do it from the activate GUI.
    • You can use SMS to pick your boot device
    • Install or boot as desired
    • Reconfigure your network as normal
      • smitty tcpip or “chdev -l en0” and “chdev -l inet0” with appropriate flags
      • Tune everything as desired.
      • If it was a Linux install, then that has its own config options.

SR-IOV can be used instead of Shared Ethernet above. 

It allows you to share a single PCI NIC or single ethernet port between LPARs.  It uses less CPU on the VIO server, and has lower latency for your LPARs.  It’s sort of the Next Generation of network virtualization, though there are some restrictions in its use.  It’s best to review all of the info, and decide up front, but is worth your time to do so.  If you want to use an SEA on SR-IOV, you still only have one VIO server per port, but you can have different ports on different VIO servers.  When sharing among all clients and VIO server without SEA, understand that the percentage capacity is a minimum guaranteed, not a cap.  Leave it low unless you have some critical workload that needs to crowd out anyone else. Some of the best URLs today when I look up “SR-IOV vNIC vio howto” are as follows:

CLI and Automation

If you want to build a whole bunch of VIO clients and servers at once, it may be worth the effort to do it from the HMC CLI.  It gets really complicated, but once you have it set up, you can adjust and rebuild things quickly.  This also lets you manually specify WWNs for your LPARs in case there are collisions, or if you are rebuilding and need to keep the same numbers.

The VIO server can be installed with alt_disk_copy, or from NIM, or from physical CD, or from the HMC.  The CLI version is called “installios” and you MUST specify the MAC address of the boot adapter for it to work properly. Without CLI options, installios will prompt you for all of the info.


AIX ramdisks

Long ago (think 1999ish), I wrote a techdoc on how to put JFS on a ramdisk on AIX. We called them FAXES, because we would fax them to people, and this was before FAQ was a common acronym. At some point, I put it into the TechDoc system when that came out, because there was a push to use the system.

I lost the original text, but the techdoc lived on. It was rewritten after I left big blue. You can see their better version here:

I don’t want to copy their doc, because they can be testy about such things. Heck, they can be testy when they plagiarize my docs. The key reference is syntax, which I’ll summarize here. You can also just look up the manpages on mkramdisk, mkfs,

Make a pinned-memory ramdisk: mkramdisk $bytes
The default uses pinend RAM, which is required for JFS or JFS2.

Make an un-pinned ramdisk: mkramdisk -u $bytes
This is okay for raw devices, maybe UDFS, but not for JFS. There are latency/access requirements on JFS, but at least mkfs knows to throw an error here if you try to skip it.

When you run mkfs on /dev/ramdisk0 as JFS, it’s normal, except you mount -o nointegrity.

When you run mkfs on /dev/ramdisk0 as JFS2, use -o log=INLINE on the format, and the mount.

You can, of course, format UDF as well: udfcreate -f3 -d/dev/ramdisk0 ; mount -vudfs /dev/ramdisk0 /RAMDISK

You could probably run a mksysb to the ramdisk. I don’t know if it would be raw, or if it would be UDFS. That might be useful for high speed testing, but of course, the ramdisk evaporates on reboot. You could dd the ramdisk out to some other media.

Bike Tires & Pressure

This is reference info for me:

  • Pavement Reference: 700c, 28mm @ 120psi for 300 LB ride weight, 60% rear
  • Cruiser Reference: 32er, 55mm @ 60psi for 310 LB ride weight, 70% rear
  • Off-Road Reference: 700c, 40mm @ 40psi for 180 LB ride weight, 60% rear

Slower speed, butt off the seat, you can go lower psi. You’d be risking pinch flats on longer rides, or unseating the bead in harder turns, etc

Tread pattern is coarse for rough terrain, fine for sand & hardpack, and smooth for pavement.

Higher pressure prevents tire flex, and is better on pavement. – Less shock absorption, grippy on soft, loose surface.

Lower pressure increases tire flex, which grips obstacles better. – Increased risk of pinch flats, or rolling off the rim.

General width preferences:

  • Hardpack or pavement – narrow to prevent drag
  • Sand, pea gravel, mud – wide to prevent sinking
  • loose, large gravel – wide to prevent pinch flats, throwing gravel, etc

Weight Distribution

  • Cruiser ~ 70% rear.
  • Mountain ~ 60% rear.
  • Race Road ~ 55% rear.

Proportional adjustments:

  • Narrower tire for larger diameter
  • Lower pressure for lower weight
  • Lower pressure for wider tire

Rim sizes:

  • 559mm = 26er
  • 584mm = 650b / 27.5″
  • 622mm = 700c / 29er
  • 686mm = 32er
  • 787mm = 36er

dsmserv fails to start if LDAP is inaccessible

IBM, and the white books, say this is working as designed.
• If LDAP dies, dsmserv stays up without it.
• If LDAP dies, and dsmserv restarts, it refuses to come up.
• ANR3103E
• Workaround is remove LDAPURL from dsmserv.opt, or wait for LDAP to become accessible.

On a multi-homed server, any links serving LDAP should be fault tolerant.

Any server using LDAP should have fault tolerant LDAP servers.

Here’s where to vote on getting the start-up limitation changed.

Spectrum Protect / TSM systemd autostart

cat < <'EOF' >/etc/systemd/system/db2fmcd.service


systemctl enable db2fmcd.service
systemctl start db2fmcd.service

cp -p /opt/tivoli/tsm/server/bin/dsmserv.rc /etc/init.d/tsminst1
cat < <'EOF' >>/etc/systemd/system/tsminst1.service

ExecStart=/etc/init.d/tsminst1 start
ExecReload=/etc/init.d/tsminst1 reload
ExecStop=/etc/init.d/tsminst1 stop

systemctl enable tsminst1.service
systemctl start tsminst1.service

ln -s /opt/tivoli/tsm/client/ba/bin/rc.dsmcad /etc/init.d/dsmcad
cat < <'EOF' >>/etc/systemd/system/dsmcad.service

ExecStart=/etc/init.d/dsmcad start
ExecReload=/etc/init.d/dsmcad reload
ExecStop=/etc/init.d/dsmcad stop

systemctl enable dsmcad.service
systemctl start dsmcad.service

AIX 7.2 crash removing adapters from etherchannel

If I remove the first main adapter, and re-add it, then I can add/remove either adapter or IP interface after that.

If I remove the second main adapter, and re-add it, then I cannot remove the first, and dropping the IP interface crashes.

So, assuming adapter_names=ent2,ent6

This works everywhere:
/usr/lib/methods/ethchan_config -d ent17 ent2
/usr/lib/methods/ethchan_config -a ent17 ent2
/usr/lib/methods/ethchan_config -d ent17 ent6
/usr/lib/methods/ethchan_config -a ent17 ent6
/usr/sbin/rmdev -Rl en17
/usr/sbin/mkdev -l en17
# Can do any combination of the above after remove/readd first adapter in advance.

And this crashes everywhere:
/usr/lib/methods/ethchan_config -d ent17 ent6
/usr/lib/methods/ethchan_config -a ent17 ent6
# crashed here on one server
/usr/lib/methods/ethchan_config -d ent17 ent2
ethchan_config: 0950-021 Unable to delete adapter ent2 from the
EtherChannel because it could not be found, errno = 2
/usr/sbin/rmdev -Rl en17

# crash here on several others

Crash analysis follows:

(96)> stat
CHRP_SMP_PCI POWER_PC POWER_8 machine with 160 available CPU(s) (64-bit

sysname... AIX
nodename.. testnode001
release... 2
version... 7
build date Mar 2 2018
build time 13:02:46
label..... 1809C_72H
machine... 00DEADBEEF00
nid....... FBCAFE4C
time of crash: Wed May 9 04:45:59 2018
age of system: 25 day, 10 hr., 54 min., 41 sec.
xmalloc debug: enabled
FRRs active... 0
FRRs started.. 0

CPU 96 CSA F00000002FF47600 at time of crash, error code for LEDs:
pvthread+1A0E00 STACK:
[00009324].unlock_enable_mem+000018 ()
(??, ??)
[060592E8]shientdd:entcore_suspend_nic+000028 (??, ??)
[0605FB20]shientdd:entcore_suspend+0001E0 (??, ??, ??)
[06129A68]shientdd:entcore_close_common+000668 (??)
[0612A0B0]shientdd:entcore_close+000490 (??)
[060103CC]shientdd:shi2ent_close+00000C (??)
[F1000000C04911C0]ethchandd:ethchan_close+0001A0 (??)
[00014D70].hkey_legacy_gate+00004C ()
[0057A914]ns_free+000074 (??)
[00014F50].kernel_add_gate_cstack+000030 ()
[069E503C]if_en:en_ioctl+0002DC (??, ??, ??)
[0057126C]if_detach+0001CC (??)
[0056E1DC]ifioctl+00081C (F00000002FF473D0, 8020696680206966,
[005EA764]soo_ioctl+0005C4 (??, ??, ??)
[007A4754]common_ioctl+000114 (??, ??, ??, ??)
[00003930]syscall+000228 ()
[kdb_get_virtual_memory] no real storage @ 2FF22358
[D011C92C]D011C92C ()
[kdb_read_mem] no real storage @ FFFFFFFFFFF5D60

(96)> status | grep -v wait
96 20E03BF 6670 380324 3128 ifconfig

(96)> vmlog
Most recent VMM errorlog entry
Error id = DSI_PROC
Exception DSISR/ISISR = 000000000A000000
Exception srval = 00007FFFFFFFD080
Exception virt addr = 0000000000000004
Exception value = 00000086 EXCEPT_PROT

Protection exception. An attempt was made to write to a protected
address in memory

(96)> th -n ifconfig
pvthread+1A0E00 6670*ifconfig RUN 20E03BF 03E 96 0
shientdd:.entcore_disable_tx_timeout_timers AF123_105+000074
bla < .unlock_enable>
2390 ! SUNLOCK(TX_QUEUE_SLOCK, tx_pri);

---- NDD INFO ----( F1000B003952B410)----
name............. ent6 alias............ en6
ndd_next......... 0000000000000000
ndd_flags........ 00610812
ndd_2_flags...... 00000930

(96)> print entcore_acs_t F1000B00393F0000
struct entcore_acs_t
struct entcore_tx_queue_t
< ...>
struct entcore_ras_cb_t *ffdc_ras_cb = 0xF1000B0039537D40;
struct entcore_tx_atomics_t *atomics = 0x0000000000000000;
struct mbuf *overflow_queue = 0x0000000000000000;
struct mbuf *overflow_queue_tail = 0x0000000000000000;
uint64_t ofq_cnt = 0x0000000000000000;
struct entcore_lock_info_t *p_lock_info = 0x0000000000000000;
void *p_acs = 0xF1000B00393F0000; NULL so DSI

(96)> dd F1000B00393F78D0
F1000B00393F78D0: 0000000000000000 < - p_lock_info

(96)> xm F1000B00393F78D0
Page Information:
heap_vaddr = F1000B0000000000
P_allocrange (range of 2 or more allocated full pages)
page........... 00003937 start.. F1000B00393F0000 page_cnt....... 0017
allocated_size. 00170000 pd_size........ 00010000 pinned......... yes

Allocation Record:
F1000B00E4306600: addr......... F1000B00393F0000 allocated pinned
F1000B00E4306600: req_size..... 1458712 act_size..... 1507328
F1000B00E4306600: tid.......... 033F0187 comm......... cfgshien
Trace during xmalloc() on CPU 00

Free History:
105D 40.955808 SHIENTDD GEN: L3 Close__B d1=F1000B00393F0000
105D 40.955808 SHIENTDD GEN: L3 CloseC_B d1=F1000B00393F0000
105D 40.955809 SHIENTDD GEN: L3 HwClos_B d1=F1000B00393F0000
105D 40.955810 SHIENTDD GEN: L3 HwClos_B -HW| d1=0000000000000000
105D 40.955810 SHIENTDD GEN: L3 HwClos10 -HW| d1=0000000000000000
105D 40.955810 SHIENTDD GEN: L3 HwClos_E -HW| d1=0000000000000000
105D 40.955811 SHIENTDD GEN: L3 HwClos_E d1=0000000000000000

< ...>

105D 41.039269 SHIENTDD GEN: L3 CloseC_E d1=F1000B00393F0000
105D 41.039269 SHIENTDD GEN: L3 Close__E d1=0000000000000000
105D 41.039273 SHIENTDD GEN: L3 Close__B d1=F1000B00393F0000

another close ? >>

105D 41.039273 SHIENTDD GEN: L3 CloseC_B d1=F1000B00393F0000
105D 41.039274 SHIENTDD GEN: L3 HwClos_B d1=F1000B00393F0000
105D 41.039275 SHIENTDD GEN: L3 HwClos_B -HW| d1=0000000000000000
105D 41.039275 SHIENTDD GEN: L3 HwClos10 -HW| d1=0000000000000000
105D 41.039276 SHIENTDD GEN: L3 HwClos_E -HW| d1=0000000000000000
105D 41.039276 SHIENTDD GEN: L3 HwClos_E d1=0000000000000000
105D 41.039276 SHIENTDD GEN: L3 Suspnd_B d1=F1000B00393F0000
105D 41.039279 SHIENTDD GEN: L3 MctSyn_B d1=F1000B00393F0000
105D 41.039281 SHIENTDD GEN: L3 MctSyn_E d1=0000000000000000

It seems that 2 closes happened, which would have leaded to a double free, and the crash.

Debug efix was tested for 2 weeks on 24 systems and problem was resolved, patch was stabl.

APAR IJ06720 was generated, and a public efix will be released for that./