Gathering HACMP Info

Often, when working with a cluster, you might want to rebuild it from scratch, rather than take the time to figure out what is broken. Here are some commands to gather basic info for AIX and email it to yourself. Obviously, change the email address at the end.

(
echo '#########################' 
echo '#########################' OS Level
echo '#########################' 
oslevel -s
echo '#########################' 
echo '#########################' HA Level
echo '#########################' 
halevel -s
echo '#########################' 
echo '#########################' System Info
echo '#########################' 
lsattr -El sys0
echo '#########################' 
echo '#########################' Cluster Exports
echo '#########################' 
cat /usr/es/sbin/cluster/etc/exports
echo '#########################' 
echo '#########################' System Exports
echo '#########################' 
cat /etc/exports
echo '#########################' 
echo '#########################' Physical Volumes
echo '#########################' 
lspv -u
echo '#########################' 
echo '#########################' Cluster UD
echo '#########################' 
/usr/es/sbin/cluster/utilities/cllsclstr
echo '#########################' 
echo '#########################' Cluster Heartbeat
echo '#########################' 
lscluster -d
echo '#########################' 
echo '#########################' Cluster Status
echo '#########################' 
/usr/es/sbin/cluster/utilities/cllscompstat
echo '#########################' 
echo '#########################' Cluster Dump
echo '#########################' 
/usr/es/sbin/cluster/utilities/cldump
echo '#########################' 
echo '#########################' Cluster Services
echo '#########################' 
/usr/es/sbin/cluster/utilities/cllsserv
echo '#########################' 
echo '#########################' Cluster App Monitors
echo '#########################' 
/usr/es/sbin/cluster/utilities/cllsappmon
echo '#########################' 
echo '#########################' Cluster Resource Group Variables
echo '#########################' 
for i in `/usr/es/sbin/cluster/utilities/cllsgrp` ; do echo '###################' $i ; /usr/es/sbin/cluster/utilities/cllsres -g $i ; done
echo '#########################' 
echo '#########################' Cluster Resource Group Details
echo '#########################' 
for i in `/usr/es/sbin/cluster/utilities/cllsgrp` ; do echo '###################' $i ; /usr/es/sbin/cluster/utilities/clshowres -g $i ; done
echo '#########################' 
echo '#########################' Cluster Interfaces
echo '#########################' 
/usr/es/sbin/cluster/utilities/cllsif
echo '#########################' 
echo '#########################' Network Interfaces
echo '#########################' 
ifconfig -a
echo '#########################' 
echo '#########################' Rhosts
echo '#########################' 
cat /.rhosts
echo '#########################' 
echo '#########################' root rhosts
echo '#########################' 
cat /root/.rhosts
echo '#########################' 
echo '#########################' cluster rhosts
echo '#########################' 
cat /etc/cluster/rhosts
echo '#########################' 
echo '#########################' New custer rhosts
echo '#########################' 
cat /usr/es/sbin/cluster/etc/rhosts
echo '#########################' 
echo '#########################' Net monitor IPs
echo '#########################' 
cat /usr/es/sbin/cluster/netmon.cf
echo '#########################' 
echo '#########################' File Collections
echo '#########################' 
odmget HACMPfilecollection
echo '#########################' 
echo '#########################' Collection Files
echo '#########################' 
odmget HACMPfcfile
echo '#########################' 
echo '#########################' Free Major Numbers
echo '#########################' 
lvlstmajor
echo '#########################' 
echo '#########################' Example commands for VG Imports
echo '#########################' 
for VG in `lsvg |egrep -v 'rootvg|caavg'`; do 
  echo `getlvodm -d $VG` `lspv | grep $VG | tr -s [:space:] | sort -k 2 | head -1` \
  | awk '{print "importvg -V" , $1 , "-y " , $4 , " " , $3 ; } ; ' ; done | sort
echo '#########################' 
echo '#########################' Volume Groups
echo '#########################' 
lsvg
echo '#########################' 
echo '#########################' Volume Group Details
echo '#########################' 
lsvg | xargs -n1 lsvg
echo '#########################' 
echo '#########################' Logical Volumes
echo '#########################' 
lsvg | xargs -n1 lsvg -l
echo '#########################' 
echo '#########################' Logical Volume Details
echo '#########################' 
lsvg | xargs -n1 lsvg -l | grep / | cut -f 1 -d \  | xargs -n1 lslv
echo '#########################' 
echo '#########################' Filesystems
echo '#########################' 
df -Pg
echo '#########################' 
echo '#########################' Mounts
echo '#########################' 
mount
echo '#########################' 
echo '#########################' Tunables from last boot
echo '#########################' 
cat /etc/tunables/lastboot
echo '#########################' 
echo '#########################' Device settings
echo '#########################' 
for i in `lsdev | egrep '^en|hdisk|fcs|fscsi' | cut -f1 -d\  ` ; do echo '#####################' $i ; lsattr -El $i ; done | egrep -v 'False$'
echo '#########################' 
echo '#########################' Crontab entries
echo '#########################' 
crontab -l
echo '#########################' 
echo '#########################' snmp config
echo '#########################' 
cat /etc/snmpdv3.conf
echo '#########################' END END END
) 2>&1 | mail -vs `hostname` jdavis@omnitech.net


apt sandbox permissions

Every repo was giving signature errors in apt:
Err:6 http://security.debian.org stretch/updates InRelease
At least one invalid signature was encountered.

This was pretty recent. My updates in May were fine.
This ONLY affected apt* update. Not clean, install, purge, etc.

I could bypass the error by telling the sandbox to become root:
apt -o APT::Sandbox::User=root update

/tmp was still 1777. I did find /var/tmp was linked to /tmp, which killed dovecot install.
No idea why that’s a problem, because my /tmp is persistent across reboots.
A snotty developer somewhere indicated it was the end of the universe.
Now, /var/tmp is just part of /var. Whatever.

So, someone did a hard cleanup of cache, and that fixed it for me:
sudo apt-get clean
sudo mv /var/lib/apt/lists /tmp
sudo mkdir -p /var/lib/apt/lists/partial
sudo apt-get clean
sudo apt-get update

Then I compared /tmp/lists and /var/lib/apt/lists.
Exactly the same for everything, except top level permissions.
The old one was 755 and the new one is 750.

WTF?!?!? Why do we care if “other” can read the package lists?
There is ZERO sensitive data in there?

I decided someone was intoxicated, watching Rick and Morty, making out with their significant other, and coding with their non-dominant hand, just to see if they could maintain focus on a dare.


TSM 7.1 config

In the past, I set up TSM.PWD as root, but this seems to not be what I needed.

I’m posting because the error messages and IBM docs don’t cover this.

tsmdbmgr.log shows:
ANS2119I An invalid replication server address return code rc value = 2 was received from the server.

TSM Activity log shows:
ANR2983E Database backup terminated due to environment or setup issue related to DSMI_DIR – DB2 sqlcode -2033 sqlerrmc 168. (SESSION: 1, PROCESS: 9)

db2diag.log shows:

2014-02-26-13.54.12.425089-360 E415619A371 LEVEL: Error
PID : 15138852 TID : 1 PROC : db2vend
INSTANCE: tsminst1 NODE : 000
HOSTNAME: tsmserver
EDUID : 1
FUNCTION: DB2 UDB, database utilities, sqluvint, probe:321
DATA #1 : TSM RC, PD_DB2_TYPE_TSM_RC, 4 bytes
TSM RC=0x000000A8=168 — see TSM API Reference for meaning.

EDUID : 38753 EDUNAME: db2med.35926.0 (TSMDB1) 0
FUNCTION: DB2 UDB, database utilities, sqluMapVend2MediaRCWithLog, probe:656
DATA #1 : String, 134 bytes
Vendor error: rc = 11 returned from function sqluvint.
Return_code structure from vendor library /tsm/tsminst1/sqllib/adsm/libtsm.a:

DATA #2 : Hexdump, 48 bytes
0x0A00030462F0C4D0 : 0000 00A8 3332 3120 3136 3800 0000 0000 ….321 168…..
0x0A00030462F0C4E0 : 0000 0000 0000 0000 0000 0000 0000 0000 …………….
0x0A00030462F0C4F0 : 0000 0000 0000 0000 0000 0000 0000 0000 …………….

EDUID : 38753 EDUNAME: db2med.35926.0 (TSMDB1) 0
FUNCTION: DB2 UDB, database utilities, sqluMapVend2MediaRCWithLog, probe:696
MESSAGE : Error in vendor support code at line: 321 rc: 168

RC 168 per dsmrc.h means:
#define DSM_RC_NO_PASS_FILE 168 /* password file needed and user is
not root */

Verified everything required for this:
• passworddir points to the right directory
• DSMI_DIR points to the right directory
• dsmtca runs okay
• dsmapipw runs okay

Verified hostname info was correct

dsmffdc.log shows:
[ FFDC_GENERAL_SERVER_ERROR ]: (rdbdb.c:4200) GetOtherLogsUsageInfo failed, rc=2813, archLogDir = /tsm/arch.

Checked, and the log directory inside dsmserv.opt was typoed as /tsm/arch instead of /tsm/arc as was used to create the instance and as exists on the filesystems.

Updated dsmserv.opt and restarted tsm server. No change other than fixing Q LOG

SOLUTION:
The TSM.PWD file must be owned by the instance user, not by root.
Make sure to run the dsmapipw as the instance user, or chown the file after.


Need someone for Dallas job

We’re looking for someone smart and used to large companies to help on a project. Contact me if you can help or know someone who can.

Contract- Duration: February – July 2014

Location – Dallas, near IH635 and US75. No travel.

This role will be responsible for gathering requirements and building the appropriate documentation to perform a disaster recovery test for the customer.

· Conduct interviews with infrastructure and application teams

· Document formal system recovery and disaster recovery processes

o Organization requirements for successful testing event

o Application and system requirements and dependencies for DR test

o Gaps discovered during interview process, work with teams to document solution(s) or alternatives

o Priorities and processes for system and application startup for disaster recovery event

· Assist in building scope documentation for current and future testing exercises

· Assist during July DR exercise, capturing deviations or oversights in the created plans. Updating the plans accordingly

Candidate should be driven, but flexible. There are existing templates for this exercise, however there will likely be customization required to align with the customer’s expectations. This is one of several initiatives in flight, which will require the candidate to be persistent with follow-up and organized/proactive in planning interviews and keeping the various teams accountable for the deliverables.