errpt disk errors

SC_DISK_PCM_ERR1 Subsystem Component Failure

The storage subsystem has returned an error indicating that some component (hardware or software) of the storage subsystem has failed. The detailed sense data identifies the failing component and the recovery action that is required. Failing hardware components should also be shown in the Storage Manager software, so the placement of these errors in the error log is advisory and is an aid for your technical-support representative.

SC_DISK_PCM_ERR2 Array Active Controller Switch

The active controller for one or more hdisks associated with the storage subsystem has changed. This is in response to some direct action by the AIX host (failover or autorecovery). This message is associated with either a set of failure conditions causing a failover or, after a successful failover, with the recovery of paths to the preferred controller on hdisks with the autorecovery attribute set to yes.

SC_DISK_PCM_ERR3 Array Controller Switch Failure

An attempt to switch active controllers has failed. This leaves one or more paths with no working path to a controller. The AIX MPIO PCM will retry this error several times in an attempt to find a successful path to a controller.

SC_DISK_PCM_ERR4 Array Configuration Changed

The active controller for an hdisk has changed, usually due to an action not initiated by this host. This might be another host initiating failover or recovery, for shared LUNs, a redistribute operation from the Storage Manager software, a change to the preferred path in the Storage Manager software, a controller being taken offline, or any other action that causes the active controller ownership to change.

SC_DISK_PCM_ERR5 Array Cache Battery Drained

The storage subsystem cache battery has drained. Any data remaining in the cache is dumped and is vulnerable to data loss until it is dumped. Caching is not normally allowed with drained batteries unless the administrator takes action to enable it within the Storage Manager software.

SC_DISK_PCM_ERR6 Array Cache Battery Charge Is Low

The storage subsystem cache batteries are low and need to be charged or replaced.

SC_DISK_PCM_ERR7 Cache Mirroring Disabled

Cache mirroring is disabled on the affected hdisks. Normally, any cached write data is kept within the cache of both controllers so that if either controller fails there is still a good copy of the data. This is a warning message stating that loss of a single controller will result in data loss.

SC_DISK_PCM_ERR8 Path Has Failed

The I/O path to a controller has failed or gone offline.

SC_DISK_PCM_ERR9 Path Has Recovered

The I/O path to a controller has resumed and is back online.

SC_DISK_PCM_ERR10 Array Drive Failure

A physical drive in the storage array has failed and should be replaced.

SC_DISK_PCM_ERR11 Reservation Conflict

A PCM operation has failed due to a reservation conflict. This error is not currently issued.

SC_DISK_PCM_ERR12 Snapshot™ Volume’s Repository Is Full

The snapshot volume repository is full. Write actions to the snapshot volume will fail until the repository problems are fixed.

SC_DISK_PCM_ERR13 Snapshot Op Stopped By Administrator

The administrator has halted a snapshot operation.

SC_DISK_PCM_ERR14 Snapshot repository metadata error

The storage subsystem has reported that there is a problem with snapshot metadata.

SC_DISK_PCM_ERR15 Illegal I/O – Remote Volume Mirroring

The I/O is directed to an illegal target that is part of a remote volume mirroring pair (the target volume rather than the source volume).

SC_DISK_PCM_ERR16 Snapshot Operation Not Allowed

A snapshot operation that is not allowed has been attempted.

SC_DISK_PCM_ERR17 Snapshot Volume’s Repository Is Full

The snapshot volume repository is full. Write actions to the snapshot volume will fail until the repository problems are fixed.

SC_DISK_PCM_ERR18 Write Protected

The hdisk is write-protected. This can happen if a snapshot volume repository is full.

SC_DISK_PCM_ERR19 Single Controller Restarted

The I/O to a single-controller storage subsystem is resumed.

SC_DISK_PCM_ERR20 Single Controller Restart Failure

The I/O to a single-controller storage subsystem is not resumed. The AIX MPIO PCM will continue to attempt to restart the I/O to the storage subsystem.


SMB/CIFS 3 on AIX

Mounting should be vaguely similar to the SMB1 mounting you had before.

Download and install SMB Client 3, and “Network Authentication Service” (aka kerberos 5) from here:

https://www-01.ibm.com/marketing/iwm/iwm/web/pickUrxNew.do?source=aixbp

Ensure your Windows 201x server has SMB v3 enabled.

You want a service account in AD to use for your SMB3 mounts on AIX.

 

Notes about options:

encryption should be desired and secure_negotiate should be desired.
signing should be enabled
​​​​​pver should be 3.0.2
The kerberos realm specified in the “wrkgrp” option must be in all UPPERCASE if your domain is in uppercase.
The username provided for mounting is used for all read/write permissions/access.  
UID and GID default to root.system, but you can specify others.
fmode is the inverse of umask, and what the files’ permissions look like across the whole share.  Default is 755.
port can be 139 (ipv4) or 445 (ipv4 or ipv6).  Default is 445.

 

/etc/filesystems format:

/mnt:
     dev = /corpshare
     vfs = smbc
     mount = true
     options = “wrkgrp=CORP.DOMAIN,signing=enabled,pver=3.0.2,encryption=desired,secure_negotiate=desired”
     nodename = win2016server.corp.domain/sambauser

 

Command line example

mount -v smbc -n win2016server.corp.domain/sambauser/Passw0rd! \
-o “wrkgrp=CORP.DOMAIN,port=445,signing=required,encryption=required, \
secure_negotiate=desired,pver=auto” /corpshare /mnt

 

Store the samba credentials

mksmbcred -s win2016server.corp.comain -u sambauser [-p password]

See also lssmbcred, chsmbcred, and rmsmbcred.

 

Reference 2021:

https://www.ibm.com/docs/en/aix/7.2?topic=protocol-server-message-block-smb-client-file-system 


reducevg very slow

This is an APAR, but really it’s a description. Reducevg sends the equivalent of TRIM commands, but on a storage array, this is writing nulls. On a big LUN, or with a busy array, this can take a long time. If you do not need to worry about this, then you can disable that space reclaim.

ioo -o -dk_lbp_enabled=0

Here is the IBM doc about it.

 

IJ23045: REDUCEVG UNCLEAR ON DELAY WHEN WAITING FOR INFLIGHT RECLAIM REQ APPLIES TO AIX 7100-05

 

A fix is available

APAR status

  • Closed as program error.

Error description

  • reducevg may be unclear, why there is some delay
    when waiting on inflight reclaim requests.
    

Local fix

  • Disable space reclamation by running:
    ioo -o dk_lbp_enabled=0
    

Problem summary

  • reducevg may be unclear, why there is some delay
    when waiting on inflight reclaim requests.
    

Problem conclusion

  • reducevg displays message incase there are space reclamation
    IOs inflight to indicate reducevg may take some time to
    complete.

Gathering HACMP Info

Often, when working with a cluster, you might want to rebuild it from scratch, rather than take the time to figure out what is broken. Here are some commands to gather basic info for AIX and email it to yourself. Obviously, change the email address at the end.

(
echo '#########################' 
echo '#########################' OS Level
echo '#########################' 
oslevel -s
echo '#########################' 
echo '#########################' HA Level
echo '#########################' 
halevel -s
echo '#########################' 
echo '#########################' System Info
echo '#########################' 
lsattr -El sys0
echo '#########################' 
echo '#########################' Cluster Exports
echo '#########################' 
cat /usr/es/sbin/cluster/etc/exports
echo '#########################' 
echo '#########################' System Exports
echo '#########################' 
cat /etc/exports
echo '#########################' 
echo '#########################' Physical Volumes
echo '#########################' 
lspv -u
echo '#########################' 
echo '#########################' Cluster UD
echo '#########################' 
/usr/es/sbin/cluster/utilities/cllsclstr
echo '#########################' 
echo '#########################' Cluster Heartbeat
echo '#########################' 
lscluster -d
echo '#########################' 
echo '#########################' Cluster Status
echo '#########################' 
/usr/es/sbin/cluster/utilities/cllscompstat
echo '#########################' 
echo '#########################' Cluster Dump
echo '#########################' 
/usr/es/sbin/cluster/utilities/cldump
echo '#########################' 
echo '#########################' Cluster Services
echo '#########################' 
/usr/es/sbin/cluster/utilities/cllsserv
echo '#########################' 
echo '#########################' Cluster App Monitors
echo '#########################' 
/usr/es/sbin/cluster/utilities/cllsappmon
echo '#########################' 
echo '#########################' Cluster Resource Group Variables
echo '#########################' 
for i in `/usr/es/sbin/cluster/utilities/cllsgrp` ; do echo '###################' $i ; /usr/es/sbin/cluster/utilities/cllsres -g $i ; done
echo '#########################' 
echo '#########################' Cluster Resource Group Details
echo '#########################' 
for i in `/usr/es/sbin/cluster/utilities/cllsgrp` ; do echo '###################' $i ; /usr/es/sbin/cluster/utilities/clshowres -g $i ; done
echo '#########################' 
echo '#########################' Cluster Interfaces
echo '#########################' 
/usr/es/sbin/cluster/utilities/cllsif
echo '#########################' 
echo '#########################' Network Interfaces
echo '#########################' 
ifconfig -a
echo '#########################' 
echo '#########################' Rhosts
echo '#########################' 
cat /.rhosts
echo '#########################' 
echo '#########################' root rhosts
echo '#########################' 
cat /root/.rhosts
echo '#########################' 
echo '#########################' cluster rhosts
echo '#########################' 
cat /etc/cluster/rhosts
echo '#########################' 
echo '#########################' New custer rhosts
echo '#########################' 
cat /usr/es/sbin/cluster/etc/rhosts
echo '#########################' 
echo '#########################' Net monitor IPs
echo '#########################' 
cat /usr/es/sbin/cluster/netmon.cf
echo '#########################' 
echo '#########################' File Collections
echo '#########################' 
odmget HACMPfilecollection
echo '#########################' 
echo '#########################' Collection Files
echo '#########################' 
odmget HACMPfcfile
echo '#########################' 
echo '#########################' Free Major Numbers
echo '#########################' 
lvlstmajor
echo '#########################' 
echo '#########################' Example commands for VG Imports
echo '#########################' 
for VG in `lsvg |egrep -v 'rootvg|caavg'`; do 
  echo `getlvodm -d $VG` `lspv | grep $VG | tr -s [:space:] | sort -k 2 | head -1` \
  | awk '{print "importvg -V" , $1 , "-y " , $4 , " " , $3 ; } ; ' ; done | sort
echo '#########################' 
echo '#########################' Volume Groups
echo '#########################' 
lsvg
echo '#########################' 
echo '#########################' Volume Group Details
echo '#########################' 
lsvg | xargs -n1 lsvg
echo '#########################' 
echo '#########################' Logical Volumes
echo '#########################' 
lsvg | xargs -n1 lsvg -l
echo '#########################' 
echo '#########################' Logical Volume Details
echo '#########################' 
lsvg | xargs -n1 lsvg -l | grep / | cut -f 1 -d \  | xargs -n1 lslv
echo '#########################' 
echo '#########################' Filesystems
echo '#########################' 
df -Pg
echo '#########################' 
echo '#########################' Mounts
echo '#########################' 
mount
echo '#########################' 
echo '#########################' Tunables from last boot
echo '#########################' 
cat /etc/tunables/lastboot
echo '#########################' 
echo '#########################' Device settings
echo '#########################' 
for i in `lsdev | egrep '^en|hdisk|fcs|fscsi' | cut -f1 -d\  ` ; do echo '#####################' $i ; lsattr -El $i ; done | egrep -v 'False$'
echo '#########################' 
echo '#########################' Crontab entries
echo '#########################' 
crontab -l
echo '#########################' 
echo '#########################' snmp config
echo '#########################' 
cat /etc/snmpdv3.conf
echo '#########################' END END END
) 2>&1 | mail -vs `hostname` jdavis@omnitech.net


AIX machine parsable outputs

For single-line outputs, some standardization could be good here, specifically, perhaps adding the -F flag to LVM and VPD queries.

Device commands do have a standardized way to get this:

#################
For lsdev, you can use -F
#################
# lsdev -H -Ccadapter -F 'name;class;subclass;type;location;physloc;description'
name;class;subclass;type;location;physloc;description
ent0;adapter;pciex;df1020e2e304;02-00;U78D2.001.XXXXXXX-P1-C9-T1;PCIe3 4-Port 10GbE SR Adapter (df1020e21410e304)

#################
For lsattr, you can do the same
#################
#lsattr -El sys0 -H -F 'attribute:value:description:user_settable'
attribute:value:description:user_settable

SW_dist_intr:false:Enable SW distribution of interrupts:True
autorestart:true:Automatically REBOOT OS after a crash:True
boottype:disk:N/A:False
capacity_inc:0.01:Processor capacity increment:False
capped:false:Partition is capped:False
chown_restrict:true:Chown Restriction Mode:True
clouddev:0:Recreate ODM devices on next boot:True
conslogin:enable:System Console Login :False
cpuguard:enable:CPU Guard:True


However, VPD is less standardized.


#################
lscfg -vl is brutal to parse, and ODM does not have all of this. You can get useful data with:
#################

# lsvpd | while read type data ; do if [[ "${type}" == "*YL" ]] ; then echo "" ; fi ; echo "$type $data" ; done | more | grep -p fcs0
*YL U78D2.001.XXXXXXX-P1-C9-T4
*FC ????????
*DS PCIe3 4-Port 16Gb FC Adapter (df1000e314101406)
*AX fcs0
*PL 03-00
*CD 10140614
*PN 01FT695
*SN Y050HY95A012
*EC P14609
*CC 578E
*MF 001D
*FN 01FT699
*ZM 3
*Z0 0000000C
*Z1 00000001
*Z2 00000000
*Z3 08090000
*Z4 01000001
*Z5 2E343135
*Z6 2E343135
*Z7 C0022C40
*Z8 20000010XXXXXXXX
*Z9 11.4.415.5
*ZA 11.4.415.5
*ZB 00000000
*ZC 00040000
*ZD 000000FF

In this case, it’s missing “Network Address”, but you can usually convert Z8. This doesn’t work on virtual though. I usually do something more like this to get the data I want:

for fcs in lsdev -C | grep fcs | cut -f 1 -d \ ; do
fscsi=lsdev -p$fcs | grep -i scsi | cut -f 1 -d \ ; hn=hostname;
echo $hostname lscfg -vsl $fcs | egrep 'fcs|Address' | tr . \ | tr -s [:space:] FC ID: lsattr -a scsi_id -F value -El $fscsi ; done

vio2a fcs0 U78AE 001 XXXXXXX-P1-C19-L1-T1 Network Address 10000090FXXXXXXX FC ID: 0x450200
vio2a fcs1 U78AE 001 XXXXXXX-P1-C19-L1-T2 Network Address 10000090FXXXXXXX FC ID: 0x460200

That works on virtual hosts also.


LVM is the worst, and requires transformation


Other than cutting out specific data that you need, the AIX way is to use mkvgdata, and then parse the image.data file it produces.

Alternatively, you can do something like this:

# (lsvg rootvg | cut -c 1-45 ; lsvg rootvg | cut -c 46-999 ) | cut -f 1 -d ( | tr -d \ | tr : \ | while read variable value ; do echo ${variable}=${value} ; done | egrep -v '=$' | sort -r
VOLUMEGROUP=rootvg
VGSTATE=active
VGPERMISSION=read/write
VGIDENTIFIER=00XXXXX00000YYYYY0000ZZZZZZZZZZZ

This same works with lslv, or often you can get key info from “lsvg -l rootvg” or similar.


AIX 2020 PCMPATH Replacement

With AIX PCM, we no longer have pcmpath or datapath.
Most of the info you’d want for disks is from lsmpio or lspath.
For some adapter queries, you’re stuck. Here is a simulated output generator.
This is not super efficient, but it’s on par with the kind of things many AIX scripts do.

pcmpath_query() {
  printf "%-8s %8s %15s %6s %6s %6s\n" Adapter Status Selects Errors Paths Failed 
  for fscsi in $(lsdev -C | grep fscsi | cut -f 1 -d \  ) ; do
    enabled=$(lspath -p $fscsi | grep Enabled | wc -l)
    failed=$(lspath -p $fscsi | grep -v Enabled | wc -l)
    if [[ $(( $enabled + $failed )) -eq 0 ]] ; then status=UNUSED ; 
    elif [[ $failed -eq 0 ]] ; then status=NORMAL 
    elif [[ $enabled -eq 0 ]]; then status=FAILED
    else status=DEGRADED; fi
    fcs=$(lsdev -CFparent -l $fscsi)
    selects=$(fcstat $fcs | grep Requests | tr -d \  | cut -f 2 -d : | paste -sd+ - | bc)
    errors=$(lsmpio -ae  | grep -p $fscsi | grep Total | tr -d \  | cut -f 2 -d : )
    paths=$(lsmpio -ar | grep -p $fscsi | grep 0x | wc -l)
    failed=$(lsmpio -ar | grep -p fscsi0 | grep 0x | awk '{print $3 "\n" $4 "\n" $5;}' | paste -sd+ - | bc)
    printf "%-8s %8s %15s %6s %6s %6s\n" $fscsi $status $selects $errors $paths $failed
  done
}

AIXPCM vs SDDPCM

AIX geeks, converting from SDDPCM to AIXPCM, when you uninstall the drivers, you also need to uninstall host attachment. On 2% of our migrations, mksysb/alt clones would fail to find the boot disk (554).

devices.fcp.disk.ibm.mpio.rte devices.sddpcm.*

Note that from SVC 7.6.1, AIX 6.1.8, AIX 7.1.3, and AIX 7.2, you MAY switch to AIXPCM. On POWER9 and later, you MUST switch to AIXPCM.

The rm script is from storage development, but the manage_disk_drivers command is from AIX dev. Either is okay, but tge AIX one does not require making a PMR.

Best reference:

https://www.ibm.com/developerworks/community/blogs/cgaix/entry/One_Path_Control_Module_to_Rule_Them_All


AIX types of ethernet interfaces

AIX shows a lot of different info in different places.  This is because AIX predates the time when everyone had RJ45 ethernet ports.

HBA represents a high-function PCI adapter that contains multiple protocols, and which can sometimes be configured to provide ENT devices.  Primary candidates are “Integrated Virtual Ethernet” on POWER5 and POWER6 servers, as well as ROCE adapters, which are “RDMA Over Converged Ethernet”, with RDMA being “Remote Direct Memory Addressing” or “Access”.  Basically, Infiniband adapters which can use ethernet at the link layer.

ENT represents the “physical port”, though that is not always the case.  I’ll explain more later.  There is one one of these for every Ethernet port visible to the operating system.

EN represents the “ETHERNET II” protocol device for IP communication.  This is the standard today, also known as “DIX Etehrnet”, named after DEC, Intel, Xerox.  This is where you will normally put your IP address.  There is one of these for every ENT device.

ET represents an IEEE 802.3 protocol device.  This would have been used in the days of Novel Networking, or with SNA protocol.  Almost no-one uses this anymore, but I’m sure there’s an AIX 3.2.5U2 microchannel server running with this somewhere in the bottom of an old government facility, with coaxial cables and barrell terminators.  Really, I don’t know why this still is needed on anything produced in the last 20 years.  There is one of these for every ENT device.

INET is for config options that affect the entire TCP/IP stack, such as persistent routes, the hostname, and whether you are bypassing ODM for config of your network (rare).  There is only one of these per system, and it is always inet0 unless someone gets cheeky.

There are other ways to get IP devices, such as IP over Fibre Channel, IP over Infiniband, IP over ATM, over FDDI, over serial or parallel, etc.  These are less common, so I’m not going into them here.

Generally, you may have a stack like this:

ent0    physical ethernet port
ent1    physical ethernet port
ent2    Etherchannel (Static, or LACP bond created out of both of the above)
ent3    Virtual Ethernet (Connects to a virtual, firmware-only switch)
ent4    Shared Ethernet (VIO server only, a software bridge between a virtual physical)
ent5    VLAN (an additional VLAN port configured off of any of the above)
en0     IP interface – unused because we give ENT0 as a backing device to ENT3
en1     IP interface – also unused for the same reason
en2     IP Interface – also unused, because this is the backing device for the SEA
en3     IP interface – Also unused because this is a backing device for the SEA.
en4     IP interface hanging off of ENT4 – this can be skipped, and a virtual ethernet used
en5     IP interface hanging off of ENT5 – this can be skipped, and a virtual ethernet used

Each device has its own type of parameters.  You can use “lsattr -El $device”, “netstat -in”, and “entstat -d $device” to get details of this.  Note that entstat wants to be on the top device, not the bottom device.  Start with where the IP address is assigned, and it will show the subdevices, virtual connections, etc.


High Level VIO/Client build

This is off the cuff, and is not a technical walkthrough. This is enough for you to teach yourself assuming you have a system to hack on.

IBM’s POWER8 docs are missing almost everything. I don’t understand how they can call them docs at all. They want you to use some really picky tools that are cumbersome and not flexible in all the right ways.

The IBM POWER7 docs are close, but are missing the SR-IOV info. Your best bet is to skim though this, and stop when you find the bits you want (concepts, config):

The high level jist of building a VIO environment is as follows:

  • Configure to HMC
  • Clear managed system profile data
  • Build a couple VIO servers:
    • 6GB RAM, 3 virtual procs, 0.3 virtual CPUs, 255 CPU weight
    • At least one storage and one network adapter
    • You can use SR-IOV to share an ethernet adapter from firmware if needed
    • One virtual ethernet trunk for each separate physical network.  Assign VLANs here
    • One virtual ethernet non-trunk for each VLAN you want an IP address on (ideal, but you can also hang IPs and VLANs directly from AIX)
    • One virtual SCSI server adapter for each client LPAR that will need virtual CDROM, Virtual Tape, or legacy Virtual SCSI disk (higher CPU load).
    • One virtual fibre adapter for each client port (usually two per client on each VIO server, but can be anywhere from 18)
  • Upload the VIO base media into the HMC media repository
  • Install the VIO server from the HMC
  • SSH into the HMC, and use vtmenu to rebuild the VIO networking
    • Remove all en, et, ent, hba devices, then cfgmgr
    • mkvdev -lnagg for any etherchannel bonded pairs needed for the Shared Ethernet Adapter(s)
    • mkvdev -sea  to build any shared ethernet adapters (ethernet bridge from virtual switch to physical port)
    • mkvdev -lnagg for any etherchannel bonded pairs needed for local IP communication
    • mkvdev -vlan for any additional VLANs hanging directly off an SEA rather than through a virtual ethernet client adapter
    • mktcpip to configure your primary interface, gateway, etc
    • Add any extra IP addresses.
  • Build your Client LPARs
    • Memory, CPU, RAM as desired
    • Virtual ethernet just picks the switch and VLAN that you need.  If this does not exist on any VIO trunk adapters, then you need to fix that.
    • Virtual SCSI client adapter
      • this needs the VIO server partition ID, and the VIO server slot number added to it for the firmware connection.
      • The VIO server virtual SCSI adapter needs the same mapping back to the client LPAR id and slot.
      • There may be some GUI improvements to add this all for you, but it’s been decades of garbage for so long that I just do it all manually.
    • Virtual Fibre adapter – This maps back and forth to the VIO server virtual fibre similar to how VSCSI did.
  • SSH into the VIO server
    • make virtual optical devices attached to the “vhost” (virtual SCSI” if needed
    • Use vfcmap to map the “vfchost” adapters to real “fcs” ports.  This requires them to be NPIV capable (8gbit or newer), logged into an NPIV capable switch (lsnports).
  • Zone any LUNs
    • lsnportlogin can give you the WWNs for the clients, or you can get it from the client profile data manually
    • You can use OpenFirmware’s “ioinfo” to light up a port to force it to log in to the switch.
    • If the LPAR is down, you can use “chnportlogin” from the HMC to log in all ports for that client.
    • You can also zone directly to the VIO server, and “mkvdev” to map them as vscsi disks (higher CPU load on VIO server, and kind of a pain in the rump).
    • Note that LPM requires any VSCSI LUNs to be mapped to all VIO servers in advance.
    • Note that LPM requires any NPIV LUNs to be mapped to the secondary WWNs in advance
  • SSH into the VIO server
    • Make sure lsmap and lsmap -npiv show whatever mapping is required
    • Make sure loadopt has mounted any ISO images as virtual CDROMs if needed
    • You can also just mask an alt_disk_install LUN from a source host.
    • You can also use NIM to do a network install
  • Activate the LPAR profile.
    • If you did not open a vterm from SSH into the HMC, then you can do it from the activate GUI.
    • You can use SMS to pick your boot device
    • Install or boot as desired
    • Reconfigure your network as normal
      • smitty tcpip or “chdev -l en0” and “chdev -l inet0” with appropriate flags
      • Tune everything as desired.
      • If it was a Linux install, then that has its own config options.

SR-IOV can be used instead of Shared Ethernet above. 

It allows you to share a single PCI NIC or single ethernet port between LPARs.  It uses less CPU on the VIO server, and has lower latency for your LPARs.  It’s sort of the Next Generation of network virtualization, though there are some restrictions in its use.  It’s best to review all of the info, and decide up front, but is worth your time to do so.  If you want to use an SEA on SR-IOV, you still only have one VIO server per port, but you can have different ports on different VIO servers.  When sharing among all clients and VIO server without SEA, understand that the percentage capacity is a minimum guaranteed, not a cap.  Leave it low unless you have some critical workload that needs to crowd out anyone else. Some of the best URLs today when I look up “SR-IOV vNIC vio howto” are as follows:

CLI and Automation

If you want to build a whole bunch of VIO clients and servers at once, it may be worth the effort to do it from the HMC CLI.  It gets really complicated, but once you have it set up, you can adjust and rebuild things quickly.  This also lets you manually specify WWNs for your LPARs in case there are collisions, or if you are rebuilding and need to keep the same numbers.

The VIO server can be installed with alt_disk_copy, or from NIM, or from physical CD, or from the HMC.  The CLI version is called “installios” and you MUST specify the MAC address of the boot adapter for it to work properly. Without CLI options, installios will prompt you for all of the info.

 


AIX 7.2.3.1 breaks GSKit 8.0.50.89

AIX 7.2.3 breaks GSKit8, up through GP29 (8.0.50.89).

This affects TSP/Spectrum Protect, Content Manager, Tivoli Directory Server, Websphere, DB2, Informix, IBM HTTP Server, etc.

Before reboot, everything works still, which implies the change is in the kernel.

We found it on TSM, and AIX 7200-03-01-1838, and Spectrum Protect server 8.1.6.0.

Application crash and DBX follow below.

ANR7800I DSMSERV generated at 12:17:13 on Sep 11 2018.
IBM Spectrum Protect for AIX
Version 8, Release 1, Level 6.000
Licensed Materials - Property of IBM
(C) Copyright IBM Corporation 1990, 2018.
All rights reserved.
U.S. Government Users Restricted Rights - Use, duplication or disclosure
restricted by GSA ADP Schedule Contract with IBM Corporation.

ANR7801I Subsystem process ID is 10944920.
ANR0900I Processing options file /home/tsminst1/dsmserv.opt.
ANR7811I Using instance directory /home/tsminst1.
Illegal instruction(coredump)

# dbx /opt/tivoli/tsm/server/bin/dsmserv core.10944896.28165312
Type 'help' for help.
[using memory image in core.10944896.28165312]
reading symbolic information ...warning: no source compiled with -g

Illegal instruction (illegal opcode) in . at 0x0 ($t1)
warning: Unable to access address 0x0 from core

(dbx) where
.() at 0x0
gsk_src_create__FPPvPv(??, ??) at 0x9000000015b6d88
__ct__8GSKMutexFv(??) at 0x9000000018d664c
__ct__20GSKPasswordEncryptorFv(??) at 0x9000000018cb248
__ct__7gsk_envFv(??) at 0x900000000aaa6b0
GskEnvironmentOpen__FPPvb(??, ??) at 0x900000000ab14c4
gsk_environment_open(??) at 0x900000000ab277c
IPRA.$CheckGSKVersion() at 0x100eecf68
tlsInit() at 0x100eecd70
main(??, ??) at 0x10000112c

(dbx) th
thread state-k wchan state-u k-tid mode held scope function

$t1 run running 41877977 k no sys
$t2 run blocked 21234465 u no sys _cond_wait_global
$t3 run running 24380103 u no sys waitpid