AIX Dog Threads broken – DO NOT USE

After -1811 builds of AIX, I found that enabling thread=on, yes, or any number, caused severe disk latency problems. Low load systems would not notice, but any database would suffer horribly.

In VIO 2.x (AIX 6.1), I would use “thread=on” for the SEA (and really any en## device), and it was fine.
Well, VIO 3.1 uses AIX 7.2.3.3 which is build 19-15 (2019 week 15).
Using thread=on here caused:
• Load sharing to fail
• Ping to some hosts from vio2 to fail
• reboot one VIO causes permanent network outage for both VIO and clients.
• Problem would change each time network was reconfigured / rebuilt.

One thing I tinkered with was load sharing for one VLAN.
I gave up during troubleshooting, but on my next build, I plan to re-test that.
The idea is two virtual switches. Both VIO have a trunk for each switch, both with the same VLAN.
Each trunk on the same VIO for the same SEA has to be the same trunk priority.
Configure the SEA with ha_mode=sharing (instead of auto), and both trunks.
Prod LPARs on one switch, non-prod LPARs on the other switch.

All of the IBM docs only show this with different PVIDs on each trunk, and then they can be on the same switch.
Maybe it’s a pipe dream that this could work, but the sharing is between trunks, not by VLAN.
The kicker would be if the SEA is not a proper bridge, but only uses some kind of shortcut for packet handling.

This issue was found to be a problem on Emulex 10gbit ethernet adapters. We have ongoing network hangs on non-shared Broadcom adapters. Intel is the gold standard, but IBM does not certify/resell the 10gbit cards for AIX on POWER. The Mellanox cards cannot network boot (maybe in p9?)

In the meantime, IBM is killing developerworks, and not porting the info to the new community blogs. This is going to be a HUGE loss of technical reference.


This site uses Akismet to reduce spam. Learn how your comment data is processed.