AIX 7.2 crash removing adapters from etherchannel

If I remove the first main adapter, and re-add it, then I can add/remove either adapter or IP interface after that.

If I remove the second main adapter, and re-add it, then I cannot remove the first, and dropping the IP interface crashes.

So, assuming adapter_names=ent2,ent6

This works everywhere:
/usr/lib/methods/ethchan_config -d ent17 ent2
/usr/lib/methods/ethchan_config -a ent17 ent2
/usr/lib/methods/ethchan_config -d ent17 ent6
/usr/lib/methods/ethchan_config -a ent17 ent6
/usr/sbin/rmdev -Rl en17
/usr/sbin/mkdev -l en17
/usr/sbin/cfgmgr
# Can do any combination of the above after remove/readd first adapter in advance.

And this crashes everywhere:
/usr/lib/methods/ethchan_config -d ent17 ent6
/usr/lib/methods/ethchan_config -a ent17 ent6
# crashed here on one server
/usr/lib/methods/ethchan_config -d ent17 ent2
ethchan_config: 0950-021 Unable to delete adapter ent2 from the
EtherChannel because it could not be found, errno = 2
/usr/sbin/rmdev -Rl en17

# crash here on several others

Crash analysis follows:

(96)> stat
SYSTEM_CONFIGURATION:
CHRP_SMP_PCI POWER_PC POWER_8 machine with 160 available CPU(s) (64-bit
registers)

SYSTEM STATUS:
sysname... AIX
nodename.. testnode001
release... 2
version... 7
build date Mar 2 2018
build time 13:02:46
label..... 1809C_72H
machine... 00DEADBEEF00
nid....... FBCAFE4C
time of crash: Wed May 9 04:45:59 2018
age of system: 25 day, 10 hr., 54 min., 41 sec.
xmalloc debug: enabled
FRRs active... 0
FRRs started.. 0

CRASH INFORMATION:
CPU 96 CSA F00000002FF47600 at time of crash, error code for LEDs:
30000000
pvthread+1A0E00 STACK:
[00009324].unlock_enable_mem+000018 ()
[06058D54]shientdd:entcore_disable_tx_timeout_timers@AF123_105+000074
(??, ??)
[060592E8]shientdd:entcore_suspend_nic+000028 (??, ??)
[0605FB20]shientdd:entcore_suspend+0001E0 (??, ??, ??)
[06129A68]shientdd:entcore_close_common+000668 (??)
[0612A0B0]shientdd:entcore_close+000490 (??)
[060103CC]shientdd:shi2ent_close+00000C (??)
[F1000000C04911C0]ethchandd:ethchan_close+0001A0 (??)
[00014D70].hkey_legacy_gate+00004C ()
[0057A914]ns_free+000074 (??)
[00014F50].kernel_add_gate_cstack+000030 ()
[069E503C]if_en:en_ioctl+0002DC (??, ??, ??)
[0057126C]if_detach+0001CC (??)
[0056E1DC]ifioctl+00081C (F00000002FF473D0, 8020696680206966,
00000000066EB8A0)
[005EA764]soo_ioctl+0005C4 (??, ??, ??)
[007A4754]common_ioctl+000114 (??, ??, ??, ??)
[00003930]syscall+000228 ()
[kdb_get_virtual_memory] no real storage @ 2FF22358
[D011C92C]D011C92C ()
[kdb_read_mem] no real storage @ FFFFFFFFFFF5D60

(96)> status | grep -v wait
CPU INTR TID TSLOT PID PSLOT PROC_NAME
96 20E03BF 6670 380324 3128 ifconfig

(96)> vmlog
Most recent VMM errorlog entry
Error id = DSI_PROC
Exception DSISR/ISISR = 000000000A000000
Exception srval = 00007FFFFFFFD080
Exception virt addr = 0000000000000004
Exception value = 00000086 EXCEPT_PROT

0x86:
Protection exception. An attempt was made to write to a protected
address in memory

(96)> th -n ifconfig
SLOT NAME STATE TID PRI RQ CPUID CL WCHAN
pvthread+1A0E00 6670*ifconfig RUN 20E03BF 03E 96 0
shientdd:.entcore_disable_tx_timeout_timers AF123_105+000074
bla < .unlock_enable>
.
2390 ! SUNLOCK(TX_QUEUE_SLOCK, tx_pri);
.

---- NDD INFO ----( F1000B003952B410)----
name............. ent6 alias............ en6
ndd_next......... 0000000000000000
ndd_flags........ 00610812
(BROADCAST!NOECHO!64BIT!CHECKSUM_OFFLOAD)
ndd_2_flags...... 00000930
(IPV6_LARGESEND!IPV6_CHECKSUM_OFFLOAD!LARGE_RECEIVE!ECHAN_ELEM)

(96)> print entcore_acs_t F1000B00393F0000
struct entcore_acs_t
struct entcore_tx_queue_t
< ...>
struct entcore_ras_cb_t *ffdc_ras_cb = 0xF1000B0039537D40;
struct entcore_tx_atomics_t *atomics = 0x0000000000000000;
struct mbuf *overflow_queue = 0x0000000000000000;
struct mbuf *overflow_queue_tail = 0x0000000000000000;
uint64_t ofq_cnt = 0x0000000000000000;
struct entcore_lock_info_t *p_lock_info = 0x0000000000000000;
void *p_acs = 0xF1000B00393F0000; NULL so DSI

(96)> dd F1000B00393F78D0
F1000B00393F78D0: 0000000000000000 < - p_lock_info

(96)> xm F1000B00393F78D0
Page Information:
heap_vaddr = F1000B0000000000
P_allocrange (range of 2 or more allocated full pages)
page........... 00003937 start.. F1000B00393F0000 page_cnt....... 0017
allocated_size. 00170000 pd_size........ 00010000 pinned......... yes
XMDBG: ALLOC_RECORD

Allocation Record:
F1000B00E4306600: addr......... F1000B00393F0000 allocated pinned
F1000B00E4306600: req_size..... 1458712 act_size..... 1507328
F1000B00E4306600: tid.......... 033F0187 comm......... cfgshien
XMDBG: ALLOC_RECORD
Trace during xmalloc() on CPU 00
0604FCB0(.entcore_allocate_acs+000310)
060129C4(.entcore_config_state_machine+
0601A884(.entcore_perform_init+0000A4)

Free History:
105D 40.955808 SHIENTDD GEN: L3 Close__B d1=F1000B00393F0000
105D 40.955808 SHIENTDD GEN: L3 CloseC_B d1=F1000B00393F0000
105D 40.955809 SHIENTDD GEN: L3 HwClos_B d1=F1000B00393F0000
105D 40.955810 SHIENTDD GEN: L3 HwClos_B -HW| d1=0000000000000000
105D 40.955810 SHIENTDD GEN: L3 HwClos10 -HW| d1=0000000000000000
105D 40.955810 SHIENTDD GEN: L3 HwClos_E -HW| d1=0000000000000000
105D 40.955811 SHIENTDD GEN: L3 HwClos_E d1=0000000000000000

< ...>

105D 41.039269 SHIENTDD GEN: L3 CloseC_E d1=F1000B00393F0000
105D 41.039269 SHIENTDD GEN: L3 Close__E d1=0000000000000000
105D 41.039273 SHIENTDD GEN: L3 Close__B d1=F1000B00393F0000

another close ? >>

105D 41.039273 SHIENTDD GEN: L3 CloseC_B d1=F1000B00393F0000
105D 41.039274 SHIENTDD GEN: L3 HwClos_B d1=F1000B00393F0000
105D 41.039275 SHIENTDD GEN: L3 HwClos_B -HW| d1=0000000000000000
105D 41.039275 SHIENTDD GEN: L3 HwClos10 -HW| d1=0000000000000000
105D 41.039276 SHIENTDD GEN: L3 HwClos_E -HW| d1=0000000000000000
105D 41.039276 SHIENTDD GEN: L3 HwClos_E d1=0000000000000000
105D 41.039276 SHIENTDD GEN: L3 Suspnd_B d1=F1000B00393F0000
105D 41.039279 SHIENTDD GEN: L3 MctSyn_B d1=F1000B00393F0000
105D 41.039281 SHIENTDD GEN: L3 MctSyn_E d1=0000000000000000
END

It seems that 2 closes happened, which would have leaded to a double free, and the crash.

Debug efix was tested for 2 weeks on 24 systems and problem was resolved, patch was stabl.

APAR IJ06720 was generated, and a public efix will be released for that./


Spectrum Protect – container vulnerability

We ran into an issue where a level-zero operator became root, and cleaned up some TSM dedupe-pool containers so he’d stop getting full filesystem alerts.

Things exposed:

How does someone that green get full, unmonitored root access?
* They told false information about timestaps during defense
* Their senior tech lead was content to advise they not move or delete files without contacting the app owner.
* Imagine if this had been a customer facing database server!

In ISP/TSM, once extents are marked damaged, a new backup of that extent will replace it.
* Good TDP4VT CTL files and other incrementals will send missing files.
* TDP for VMWare full backups fail if the control file backup is damaged.
* Damaged extents do not mark files as damaged or missing.

Replicate Node will back-propagate damaged files.
* Damaged extents do not mark files as damaged or missing.

Also, in case you missed that:
* Damaged extents do not mark files as damaged or missing.

For real, IBM says:
* Damaged extents do not mark files as damaged or missing.
* “That might cause a whole bunch of duplicates to be ingested and processed.”

IBM’s option is to use REPAIR STGPOOL.
* Requires a prior PROTECT STGPOOL (similar to BACKUP STGPOOL and RESTORE STGPOOL).
* PROTECT STGPOOL can go to a container copy on tape, a container copy on FILE, or a container primary on the replica target server.
* PROTECT STGPOOL cannot go to a cloud pool
* STGRULE TIERING only processes files, not PROTECT extents.
* PROTECT STGPOOL cannot go to a cloud pool that way either.
* There is NO WAY to use cloud storage pool to protect a container pool from damage.

EXCEPTION: Damaged extents can be replaced by REPLICATE NODE into a pool.
* You can DISABLE SES, and reverse the replication config.
* Replicate node that way will perform a FULL READ of the source pool.

There is a Request For Enhancement from November, 2017 for TYPE=CLOUD POOLTYPE=COPY.
* That would be a major code effort, but would solve this major hole.
* That has not gotten a blink from product engineering.
* Not even an “under review”, nor “No Way”, nor “maybe sometime”.

Alternatives for PROTECT into CLOUD might be:
* Don’t use cloud. Double the amount of local disk space, and replicate to another datacenter.
* Use NFS (We would need to build a beefy VM, and configure KRB5 at both ends, so we could do NFSv4 encrypted).
* Use CIFS (the host is on AIX, which does not support CIFS v3. Linux conversion up front before we had bulk data was given a big NO.)
* Use azfusefs (Again, it’s not Linux)

Anyway, maybe in 2019 this can be resolved, but this is the sort of thing that really REALLY was poorly documented, and did not get the time and resources to be tested in advance. This is the sort of thing that angers everyone at every level.

REFERENCE: hard,intr,nfsvers=4,tcp,rsize=1048576,wsize=1048576,bg,noatime


ANR3114E LDAP error 81. Failure to connect to the LDAP server

This used to be on IBM’s website, but it disappeared.  It is referenced all over the net, and needed to still exist. I only found it in the wayback machine, so I’m adding another copy to the internet.

2013 SOURCE: www-01.ibm.com/support/docview.wss?uid=swg21656339

Problem(Abstract)

When the SET LDAPUSER command is used, the connection can fail with:

ANR3114E LDAP error 81 (Can’t contact LDAP server)

Cause

The user common name (CN) in the SET LDAPUSER command contains a space or the ldapurl option is incorrectly specified.

Diagnosing the problem

Collect a trace of the Tivoli Storage Manager Server using the following trace classes:
session verbdetail ldap ldapcache unicode

More information about tracing the server can be found here: Enabling a trace for the server or storage agent

The following errors are reported within the trace:

11:02:04.127 [44][output.c][7531][PutConsoleMsg]:ANR2017I Administrator ADMIN issued command: SET LDAPPASSWORD ?***? ~
11:02:04.171 [44][ldapintr.c][548][ldapInit]:Entry: ldapUserNew =      CN=tsm user,OU=TSM,DC=ds,DC=example,DC=com
11:02:04.173 [44][ldapintr.c][5851][LdapHandleErrorEx]:Entry: LdapOpenSession(ldapintr.c:2340) ldapFunc = ldap_start_tls_s_np, ldapRc = 81, ld = 0000000001B0CAB0
11:02:04.174 [44][ldapintr.c][5867][LdapHandleErrorEx]:ldap_start_tls_s_np returned LDAP code 81(Can't contact LDAP server), LDAP Server message ((null)), and possible GSKIT SSL/TLS error 0(Success)
11:02:04.174 [44][output.c][7531][PutConsoleMsg]:ANR3114E LDAP error 81 (Can't contact LDAP server) occurred during ldap_start_tls_s_np.~
11:02:04.174 [44][ldapintr.c][6079][LdapHandleErrorEx]:Exit: rc = 2339, LdapOpenSession(ldapintr.c:2340), ldapFunc = ldap_start_tls_s_np, ldapRc = 81, ld = 0000000001B0CAB0
11:02:04.174 [44][ldapintr.c][1580][ldapCloseSession]:Entry: sessP = 0000000009B99CD0
11:02:04.175 [44][ldapintr.c][3159][LdapFreeSess]:Entry: sessP = 0000000009B99CD0
11:02:04.175 [44][ldapintr.c][2449][LdapOpenSession]:Exit: rc = 2339, ldapHandleP = 000000000AFDE740, bindDn =                              (CN=tsm user,OU=TSM,DC=ds,DC=example,DC=com)
11:02:04.175 [44][output.c][7531][PutConsoleMsg]:ANR3103E Failure occurred while initializing LDAP directory services.~
11:02:04.175 [44][ldapintr.c][856][ldapInit]:Exit: rc = 2339
11:02:04.175 [44][output.c][7531][PutConsoleMsg]:ANR2732E Unable to communicate with the external LDAP directory server.~

Resolving the problem

  • In the trace provided, the common name (CN) contains a space. (CN=tsm user,OU=TSM,DC=ds,DC=example,DC=com)

    Remove the space in the common name when using the SET LDAPUSER command. For example:

    SET LDAPUSER “CN=tsmuser,OU=TSM,DC=ds,DC=example,DC=com”

  • Use an LDAP connection utility such as ldp.exe to ensure the ldapurl option is correct and the LDAP server is accepting connections

    <ldapurl> port 636, check the box for SSL

    Verify there are no errors in the output


IBM Download Director is a beast

I’m sure this will all change in a week, but until then, here is reference for how to uninstall download director, or forcibly reinstall it.

There was no support, and no google help, no IBM search help, etc. ​After all the usual things, I went to a system without an existing DD installation.

​You can force-reinstall Download Director from here:
https://www-03.ibm.com/isc/esd/dswdown/dldirector/installation_en.html

​You can manually run DD here, but I don’t know how to feed it packages:
https://www14.software.ibm.com/dldirector/IBMDownloadDirectorApp.jnlp

​There is info on how to uninstall DD here:
​https://www-03.ibm.com/isc/esd/dswdown/dldirector/uninstall_en.html

​I’m sure these URLs will change in the next forced web redesign, but for now, this should help for people with broken DD installs.

Reinstall info is obscured in convoluted JavaScript, but here’s the uninstall information:

Windows
How to uninstall

  • Open a new cmd window, paste the following command and hit enter:
  • reg DELETE HKCU\Software\Classes\ibmddp /f && rmdir %HOMEPATH%\AppData\Local\IBM\DD /S /Q
  • You should see a “The operation completed successfully.” message.

How to verify if Download Director is installed

  • Open a new cmd window, paste the following command and hit enter:
  • (reg query HKCU\Software\Classes\ibmddp 1> NUL 2>&1 && IF EXIST %HOMEPATH%\AppData\Local\IBM\DD\DownloadDirectorLauncher.exe (echo DD Installed) else (echo DD not installed)) || echo DD not installed
  • You should see either “DD installed” or “DD not installed”.

Linux
How to uninstall

  • Open a new terminal window, paste the following command and hit enter:
  • xdg-mime uninstall ~/.local/share/applications/ibm-downloaddirector.desktop && rm -rf ~/.local/share/applications/ibm-downloaddirector.desktop ~/.config/download-director/
  • If no errors are displayed, the operation completed successfully.

How to verify if Download Director is installed

  • Open a new terminal window, paste the following command and hit enter:
  • [[ -f ~/.local/share/applications/ibm-downloaddirector.desktop || -f ~/.config/download-director/DownloadDirectorLauncher.sh ]] && echo "DD installed" || echo "DD not installed"
  • You should see either “DD installed” or “DD not installed”.

Mac
How to uninstall

  • Open the “Terminal” app, paste the following command and hit enter:
  • rm -rf ~/Applications/DownloadDirectorLauncher.app/
  • If no errors are displayed, the operation completed successfully.

How to verify if Download Director is installed

  • Open the “Terminal” app, paste the following command and hit enter:
  • [[ -d ~/Applications/DownloadDirectorLauncher.app/ ]] && echo "DD installed" || echo "DD not installed"
  • You should see either “DD installed” or “DD not installed”.

Why I wrote this up:
I find myself stuck with IBM due to the value of legacy skills vs transitioning to newer skills.
Periodically, IBM makes changes to their webpage, or code download system.
Often, these leave things inconsistent (claims that HTTP can be used, but it’s no longer available).
Worse, forced tools will stop working, and the IBM solution is to wipe your entire browser config and start over.

IBM has decided it’s better to force people to use Download Director instead of any standard protocol.
IBM’s mantra is “It worked for me in the lab, so if it doesn’t work for you, tough patooties.”
There is no escalation to people who make decisions. This has been an ongoing issue for a decade.
No one cares, except a few of the ubertechs supporting things, but they have no sway.

I’ve been using HTTP for a while, but they pulled that, so I had to use DD.
This time, DD gave me an error that JavaWS could not be started.
So I uninstalled all Java, reinstalled the newest, and DD said I had no Java installed.

There were no google hits to help, no IBM pages to help, and IBM search is useless as always.
Of the pages I found, none of them had contact forms, because that costs money.
There is no uninstall tool for Download Director.
There is no Browser Extension, no OS uninstall tool.
Removing the AppData folder does not help.

I went to a clean system, and wrote down all that I could find during a new code download attempt.
There is actually a webpage for this, but it is not indexed anywhere. That’s linked above.
That’s what this post is about.

Note that this is not acceptable in any way, and is one of the many reasons people are leaving IBM for open standards.
It’s not about “The Cloud”. It’s about IBM having so many layers between the decision-makers and the workers that they are out of touch. They have no idea how to be a tech business anymore, and are run by people who are content to gut the reputation of IBM so as to report a short-term improvement in gross profit. Zero interest in the long term.


HOWTO: AIX support for R/W filesystem on USBMS

JFS2 Unsupported
Putting JFS2 on non-LVM block devices has been working for a long time. I​ wrote up how to put JFS2 on a ramdisk back at AIX 4.3.3. I lost the techdoc from back then, but IBM has a newer re-write dated 2008 here: http://www-01.ibm.com/support/docview.wss?uid=isg3T1010722

JFS2 requires the underlying system to tell it if something goes away, or for it to stay there as long as the filesystem is mounted. LVM does this for disk, and the ramdisk drivers do this as well (mostly because if the ramdisk fails, likely the system has failed). The key there is that with JFS2, the ramdisk pages are pinned.

I wrote up including performance on USB 1 and USB2 ports in January of 2010 HOWTO: JFS2 on USB device on AIX 5.3.11.1. Everything is fine, and dandy, even mount on boot, except it’s not supported by AIX Development.

JFS2 Problems
The problem for USB Mass Storage Devices is that the device can just go away unexpectedly. If a disk goes into deep sleep, or resets because of a loose connection, the JFS drivers do not get notified. So, they take writes, and JFS2 saves them up until it’s time to flush. It goes to flush, and the I/O channel is gone. Sometimes, this is just loss of everything in cache. If it’s an important file, then the system crashes.

​Because of that, we still cannot put LVM on a USB Mass Storage Device. This would take changes to notification of device availability, perhaps changes to the sync daemon, etc. Who knows, but there’s not been enough push from paying customers to make it a priority for AIX Development. Until that happens, don’t expect formal support for JFS2 on these devices.

UDF is the solution
AIX development supports read/write and even booting from USB Mass Storage Devices, but only with UDFS. The purpose is for writing a mksysb (system boot) image, or tar/cpio files, etc, and exists because of the RDX USB Internal Dock sold with some systems.
https://www.ibm.com/support/knowledgecenter/en/ssw_aix_61/com.ibm.aix.files/usbms_fileref.htm

​Boot support is provided as well: REF: ​http://www-01.ibm.com/support/docview.wss?uid=isg1IZ66737

More info on RDX USB Internal Dock. https://www.ibm.com/support/knowledgecenter/POWER7/p7hdt/fc1103.htm

RDX is just a hot-swap USB to SATA drive bay. Any current USB drive (USB3 is preferred due to performance), should work fine.

HOWTO: Create, Read, and Write UDF on AIX

Create bootable filesystem
mksysb -eXpi /dev/usbms0

Create empty filesystem
udfcreate -d /dev/usbms0

Create UDF 2.01 filesystem
udfcreate -f3 -d/dev/usbms0

NOTE: UDF 2.01 supports a real-time filesystem. It’s still UDF, so don’t try to put a database, or a million files on there.

Access read/write
mount -vudfs /dev/usbms0 /USBDRIVE

NOTE: The mksysb is a SPOT, plus a mksysb image, so adding files to the UDF will not make the restore huge.

USB Adapters on AIX
Add-in USB3 XHCI adapter for POWER8 is:
* CCIN 58F9 – PCIE2 4-port USB3 adapter
* FC EC45 and FRU 00E2932 for Low Profile
* FC EC46 and FRU 00E2934 for full height.
* driver is 4c1041821410b204 internal or 4c10418214109e04 PCIe

Add-in USB2 EHCI adapter for POWER7 is:
* CCIN 57D1 – PCI-E 4-port USB2 adapter
* driver is 33103500 integrated or 3310e000 PCIe
* FC 2728 or FRU 46K7394

Add-in USB2 EHCI adapter for POWER6/POWER5 is:
* CCIN 28EF – PCI 2-port USB2 adapter
* FC 2738 or FRU 80P2994
* Belkin F5U219 – exact same card without the sticker.
* driver is 99172604 internal or 99172704 PCI

Original USB1 OHCI /UHCI adapter for POWER5 and earlier was
* driver 22106474 on blades or c1110358 PCI
* This device is not really available anymore.


AIX and PowerHA levels

Research shows these dates for AIX:
* AIX 7.2.1.3 should come out around October, 2017 (est Week 46)
http://www-01.ibm.com/support/docview.wss?uid=isg1IV95390 ### 7200-01-03-1720
* AIX 7.1.4.5 should come out around October, 2017 (est Week 46)
http://www-01.ibm.com/support/docview.wss?uid=isg1IV95393 ### 7100-04-05-1720
* AIX 7.1.5.0 may come out around January, 2018 (est Week 5); however, it may be cancelled.
http://www-01.ibm.com/support/docview.wss?uid=isg1IV86307 ### 7100-05-00-1731

It’s generally 26 weeks, plus or minus, from the initial YYWW date. Once a TLSP APARs releases, the YYWW code is be updated.

My PowerHA selection process would be:
* 7.1.3 SP06 if I needed to deploy quickly, because I have build docs for that. However, it may be withdrawn from marketing in 2018.
* 7.2.0 SP03 if they wanted longer support, but had time for me to work up the new procedures during the install.
* 7.2.1 SP01 when it comes out, but not 7.2.1 base.

My AIX selection process would be:
* 7.2.1.2 for any NIM server or POWER9. Next updates should be Oct 2017.
* 7.1.4.4 or later for customer preference. Next updates should be Oct 7.1.4.5 and Jan 7.1.5.0.
* 6.1.9.9 Minimum level for application compatability. This is is the final TLSP.
* For anything POWER6 or older, I push hard for p710 to p740 or s81x/s82x as replacements (cost).
* For anything AIX 5.3 or older, I push hard for app testing on newer AIX (EoS).
** PTF U866665.bff (bos.mp64.5.3.12.10.U) enables POWER8. AIX must be 5.3.12.9. Must be patched before p8 (install nim or mksysb). p8 must be 840 firmware. VIO must be 2.2.4.10 or later.
** PTF U866665 requires an active extended support agreement AND p8 systems on file. No free access to biz partnets.

Code sources:
* rpm.rte and yum ezinstall, then deploy tar, wget, and rsync:
http://public.dhe.ibm.com/aix/freeSoftware/aixtoolbox/ezinstall/ppc/
* openssh from the IBM Web Download expansion:
https://www-01.ibm.com/marketing/iwm/iwm/web/reg/pick.do?source=aixbp&lang=en_US
* AIX security patches for any DMZ hosts
http://public.dhe.ibm.com/aix/efixes/security/?C=M;O=D
ftp://ftp.software.ibm.com/aix/efixes/security/
* Base media, if I were certain the customer was entitled, but didn’t want to wait for them to provide media, Partnerworld SWAC:
https://www-304.ibm.com/partnerworld/partnertools/eorderweb/ordersw.do
* Latest service pack for AIX from Fix Central:
https://www-945.ibm.com/support/fixcentral/
https://www-945.ibm.com/support/fixcentral/aix/selectFixes?release=7.2&function=release
https://www-945.ibm.com/support/fixcentral/aix/selectFixes?release=7.1&function=release
* Latest service pack for PowerHA from Fix Central:
https://www-945.ibm.com/support/fixcentral/swg/selectFixes?parent=Cluster%20software&product=ibm/Other+software/PowerHAClusterManager&release=7.2.0&platform=All&function=all
https://www-945.ibm.com/support/fixcentral/swg/selectFixes?parent=Cluster%20software&product=ibm/Other+software/PowerHAClusterManager&release=7.1.3&platform=All&function=all

Reference: PowerHA to AIX Support Matrix:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD101347


Posted in Reference, Work | Comments Off on AIX and PowerHA levels

AIX and PowerHA versions 2017-06

This changes periodically, but for today, here is what I would do.

My PowerHA selection process would be:
* 7.1.3 SP06 if I needed to deploy quickly, because I have build docs for that.
* 7.1.4 doesn’t exist, but if it came out before deployment, I would consider it. Whichever was a newer release, latest 7.1.3 SP, or latest 7.1.4 SP.
* 7.2.0 SP03 if they wanted longer support, but had time for me to work up the new procedures during the install.
* 7.2.1 SP01 if SP01 came out before I deployed, and had chosen 7.2.0 prior. 7.2.1.0 base is available, but that’s from Dec 2016, and 7.2.0.3 is from May 2017. Newer by date is better.

My AIX selection process would be:
* Any NIM server would be AIX 7.2, latest TLSP.
* Any application support limits would win down to AIX 6.1, plus latest TLSP.
* For POWER9, I would push 7.2, latest TLSP.
* For POWER8, I would push 7.1 or later. — latest TLSP
* For POWER7, I would push 6.1 or later. — latest TLSP
* For POWER6 or older, or AIX 5.3 or older, I would push strongly against due to support and parts limitations.

Code sources:
* I would make sure to install yum from ezinstall, and deploy GNU tar and rsync:
http://public.dhe.ibm.com/aix/freeSoftware/aixtoolbox/ezinstall/ppc/
* I would update openssh from the IBM Web Download expansion:
https://www-01.ibm.com/marketing/iwm/iwm/web/reg/pick.do?source=aixbp&lang=en_US
* If any exposure to the public net, or a high-sensitivity system, I would check AIX security patches also.
http://public.dhe.ibm.com/aix/efixes/security/?C=M;O=D
ftp://ftp.software.ibm.com/aix/efixes/security/
* I would get the latest service pack for both AIX and PowerHA from Fix Central:
https://www-945.ibm.com/support/fixcentral/
* Base media, if I were certain the customer was entitled, but didn’t want to wait for them to provide media, Partnerworld SWAC:
https://www-304.ibm.com/partnerworld/partnertools/eorderweb/ordersw.do

Reference: PowerHA to AIX Support Matrix:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD101347


Bad Subnet Kills DHCPD

One, single bad IP in DHCPD config will kill the entire config file. :(

On an EdgeRouter, and probably anything with Ubiquiti, and maybe anything using the same config style (Brocade and others have the same command set)….

If you add a static reservation outside of the DHCP server’s subnet,
as in, if you typo one octet, or decide to do another subnet just because,
your DHCP server will be offline after reboot. No errors, just silently not serving.

It can be outside of the start/stop range, and that’s fine.

Really, this should give you a warning from the webUI, or it should just say “OKAY, We’ll let you hand out stupid IP addresses.” I mean, what if I wanted this to be my DHCP server, but I had a different router and subnet on the same segment?

From command line, you’ll see the error though:

admin@gw1# commit
[ service dhcp-server ]
Static DHCP lease IP '192.169.1.79' under mapping 'CustomerLaptop'
under shared network name 'LAN' is outside of the DHCP lease network '192.168.1.0/24'.
DHCP server configuration commit aborted due to error(s).
[edit]

unpacking .deb

Reminder to self:
Debian packages are stored in library archive format.
http://www.tldp.org/HOWTO/Debian-Binary-Package-Building-HOWTO/x60.html
https://www.debian.org/doc/debian-policy/ap-pkg-binarypkg.html

ar -xv file.deb
This returns three files, in this specific order:
debian-binary # A small text file. Always “2.0\n” for now.
data.tar.gz # All of the filesystem bits that get deployed
control.tar.gz # control, md5sums, and pre/post scripts

Note also that data.tar can be .xz format as well.

There are dpkg-build tools for this, but all of this can be done manually for more control if desired.