Skip to content

AIX

CoD

Entitled Software Support

AIX Web Download Pack Programs

AIX FLRT HIPER

AIX FLRT Security

POWER Server Bulletins

OLD - POWER Server Bulletins

CPU

Simulate CPU load

The number 4 indicates how many threads you want to load

perl -e 'while (--$ARGV[0] and fork) {}; while () {}' 4

Trace per-process CPU usage

tprof -x sleep 60

High j2pg usage

j2pg - Kernel process integral to processing JFS2 I/O requests.

The kernel thread is responsible of managing I/Os in JFS2 filesystems, so it is normal to see it running in case of lot of I/Os or syncd. We could see that j2pg runs syncHashList() very often.The sync is done in syncHashList(). In syncHashList(), all inodes are extracted from hash list. And whether the inode needs to synchronize or not is then judged by iSyncNeeded().

Note that a sync() call will cause the system to scan all the memory currently used for filecaching to see which pages are dirty and have to be synced to disk

Therefore, the cause of j2pg having this spike is determined by the two calls that were being made (iSyncNeeded ---> syncHashList.

What is going on here is a flush/sync of the JFS2 metadata to disk. Apparently some program went recursively through the filesystem accessing files forcing the inode access timestamp to change. These changes would have to propogated to the disk.

Here's a few reasons why j2pg would be active and consume high CPU:

  • If there several process issuing sync then the j2pg process will be very active using cpu resources.
  • If there is file system corruption then the j2pg will use more cpu resources.
  • If the storage is not running data fast enough then the j2pg process will be using high amount of cpu resources.

j2pg will get started for any JFS2 dir activity. Another event that can cause j2pg activity, is syncd. If the system experiences a lot of JFS2 dir activity, the j2pg process will also be active handling the I/O. Since syncd flushes I/O from real memory to disk, then any JFS2 dir's with files in the buffer will also be hit.

Checking the syncd...

From data, we see:

$ grep -c sync psb.elfk
351 << this is high
$ grep sync psb.elfk | grep -c oracle
348 << syncd called by Oracle user only

It appears that the number of sync which causes j2pg to run is causing spikes.

We see /usr/sbin/syncd 60

j2pg is responsible for flushing data to disk and is usually called by the syncd process. If you have a large number of sync processes running on the system, that would explain the high CPU for j2pg. The syncd setting determines the frequency with which the I/O disk-write buffers are flushed. The AIX default value for syncd as set in /sbin/rc.boot is 60. It is recommended to change this value to 10.

This will cause the syncd process to run more often and not allow the dirty file pages to accumulate, so it runs more frequently but for shorter period of time. If you wish to make this permanent then edit the /sbin/rc.boot file and change to the 60 to 10.

You may consider mounting all of the non-rootvg file systems with the 'noatime' option. This can be done without any outage:

However selecting a non-peak production hours is better:

mount -o remount,noatime /oracle
chfs -a options=noatime /oracle

noatime turns off access-time updates. Using this option can improve performance on file systems where a large number of files are read frequently and seldom updated. If you use the option, the last access time for a file cannot be determined. If neither atime nor noatime is specified, atime is the default value."

From the symptom it looks like update was intended to call SQL query but falsely invoke /usr/sbin/update command. So, please check with application team and find what these processes are and fix it not to call /usr/sbin/update if it is not intended to update the super block of file systems. Removal of all these sync processes should bring down the j2pg usage."

Memory

Memory usage per process

svmon -P -O summary=basic,unit=MB

Memory usage per user

svmon -U -t 5 -O summary=basic,unit=MB

Processes taking up paging space

svmon -P -O sortseg=pgsp

Top 15 processes using memory

svmon -Pt15 | perl -e 'while(<>){print if($.==2||$&&&!$s++);$.=0 if(/^-+$/)}'

Processes using filesystem cache

svmon -Sl | more

svmon will list each PID that has a segment mapped. Any segments marked as Unused are NOT mapped to any process.

vmstat commands

vmstat -IWwt
Column Description
kthr:b queue-count of blocked threads underway
kthr:p queue-count of raw IO threads underway
kthr:w queue-count of JFS/JFS2 IO threads underway
memory:avm Computational Memory in 4096byte memory pages
memory:fre total real-time AIX Free Memory in 4K mempages
page:fi count of default&rbr JFS/JFS2 4K page reads; no raw/CIO/mmfs/NFS
page:fo count of default&rbw JFS/JFS2 4K page writes; no raw/CIO/mmfs/NFS
page:pi count of paging space page-ins
page:po count of paging space page-outs
page:fr free rate of AIX:lrud adding to memory:fre
page:sr scan rate of AIX:lrud scanning for page:fr
faults:in count of device interrupts
faults:sy count of system calls called
faults:cs count of thread context switches
cpu:us user% of cpu:pc when ec>100 (or ent) on SPLPARs
cpu:sy system% of cpu:pc when ec>100 (or ent) on SPLPARs
cpu:id idle% of cpu:pc when ec>100 (or ent) on SPLPARs
cpu:wa wait% of cpu:pc when ec>100 (or ent) on SPLPARs

vmstat -v tuning

pending disk I/Os blocked with no pbuf

Number of pending disk I/O requests blocked because no pbuf was available. Pbufs are pinned memory buffers used to hold I/O requests at the logical volume manager layer. Count is currently for the rootvg: only.

Use AIX:lvmo to monitor the pervg_blocked_io_count of each active LVM volume group,

# lvmo –a –v rootvg
vgname = rootvg
pv_pbuf_count = 512
total_vg_pbufs = 512
max_vg_pbufs = 16384
pervg_blocked_io_count = 19
pv_min_pbuf = 512
max_vg_pbuf_count = 0
global_blocked_io_count = 1566

Acceptable tolerance is 4-digits of pervg_blocked_io_count per LVM volume group for any 90 days uptime.

Otherwise, for each LVM volume group, adjust the value of AIX:lvmo:pv_pbuf_count accordingly:

  • If 5-digits of pervg_blocked_io_count, add ~2048 pbuf’s to total_vg_pbufs per 90-day cycle.
  • If 6-digits of pervg_blocked_io_count, add ~[4*2048] pbuf’s to total_vg_pbufs per 90-day cycle.
  • If 7-digits of pervg_blocked_io_count, add ~[8*2048] pbuf’s to total_vg_pbufs per 90-day cycle.
  • If 8-digits of pervg_blocked_io_count, add ~[12*2048] pbuf’s to total_vg_pbufs per 90-day cycle.
  • If 9-digits of pervg_blocked_io_count, add ~[16*2048] pbuf’s to total_vg_pbufs per 90-day cycle.

Use AIX:lvmo to confirm/verify the value of total_vg_pbufs for each VG.

lvmo -v rootvg -o pv_pbuf_count=1024

filesystem I/Os blocked with no fsbuf

Number of filesystem I/O requests blocked because no fsbuf was available. Fsbuf are pinned memory buffers used to hold I/O requests in the filesystem layer.

Ffilesystem I/Os blocked with no fsbuf # mostly JFS

  • If many, increase ioo:numfsbufs to 512,1024 or 2048 per severity of blocked I/Os
  • Default value of ioo:numfsbufs=192 ---> 1024
  • JFS fsbufs are per-filesystem static-allocations in pinned memory
  • Must re-mount (umount; mount) filesystems for effect
ioo -p -o numfsbufs=1024

external pager filesystem I/Os blocked with no fsbuf

Number of external pager client filesystem I/O requests blocked because no fsbuf was available. JFS2 is an external pager client filesystem. Fsbuf are pinned memory buffers used to hold I/O requests in the filesystem layer.

Acceptable tolerance is 5-digits per 90 Days-Uptime.

First tactic to attempt: If 6-digits, set ioo –h j2_dynamicBufferPreallocation=128.

First tactic to attempt: If 7+ digits, set ioo –h j2_dynamicBufferPreallocation=256.

ioo -h j2_dynamicBufferPreallocation=value

The number of 16K slabs to preallocate when the filesystem is running low of bufstructs. A value of 16 represents 256K. The bufstructs for Enhanced JFS (aka JFS2) are now dynamic; the number of buffers that start on the JFS2 filesystem is controlled by j2_nBufferPerPagerDevice (now restricted), but buffers are allocated and destroyed dynamically past this initial value. If the number of external pager filesystem I/Os blocked with no fsbuf increases, the j2_dynamicBufferPreallocation should be increased for that file system, as the I/O load on a file system may be exceeding the speed of preallocation.

A value of 0 will disable dynamic buffer allocation completely.

Heavy IO workloads should have this value changed to 256.

File systems do not need to be remounted to activate.

ioo -po j2_dynamicBufferPreallocation=256

Storage

fsck details of filesystem

# /sbin/helpers/jfs2/fscklog -p /opt/ibm/scratch2
*** Checking prior fsck log. ***

Found a valid superblock.  Continuing with fsck log check.

Time Stamps
s_time.tj_sec:          Fri Feb 15 14:56:15 2019
last mounted:           Mon Dec  7 16:35:49 2020
last unmounted:         Sun Jun  7 11:02:40 2020
last marked dirty:      Never marked dirty
last recovered:         Never recovered
last size change:       Never changed size

format LUN

dd if=/dev/zero of=/dev/rhdisk2 bs=1024k count=$(bootinfo -s hdisk2)

Manually remove a hdisk

To manually delete hard disks that won’t delete.

odmget –q name=hdisk# CuAt          <-- (Should be 6 entries)
odmget –q name=hdisk# CuDv          <-- (Should be 1 entry)
odmdelete –q name=hdisk# -o CuAt    <-- (Should delete 6 entries)
odmdelete –q name=hdisk# -o CuDv    <-- (Should delete 1 entry)
rmdev /dev/hdisk# /dev/rhdisk#

Rename a volume group (VG)

# lspv
hdisk0 002322fa97605ea2 rootvg active
hdisk1 002322fa0f8c3457 oldvg active
hdisk2 002322fa84e6f325 oldvg active
# varyoffvg oldvg
# exportvg oldvg
# lspv
hdisk0 002322fa97605ea2 rootvg active
hdisk1 002322fa0f8c3457 None active
hdisk2 002322fa84e6f325 None active
# importvg -y newvg hdisk1 (or hdisk2)
newvg
# lspv
hdisk0 002322fa97605ea2 rootvg active
hdisk1 002322fa0f8c3457 newvg active
hdisk2 002322fa84e6f325 newvg active

To get a disk out of Missing/Removed state

chpv -va hdiskX

You may have to run varyonvg to get the volume group to re-probe for the disk and recognize its state has changed.

Manually bring up a path to a disk

chpath -l hdisk2 -p vscsi0 -s enable

Change a hdisk from removed to active

chpv -va <hdisk#>

Manually assign PVID to disk

chdev -a pv=yes -l <hdisk>

PowerPath commands

Display high level HBA info

powermt display

Display all devices

powermt display dev=all

Display particular device

powermt display dev=hdiskpower0

Retrieve PowerPath registration key

powermt check_registration

Display PowerPath options

powermt display options

Display HBA mode enabled/disabled

powermt display hba_mode

Display I/O paths

powermt display paths

Display port status

powermt display port_mode

Display PowerPath version

powermt version

Check I/O paths

If you have made changes to the HBA’s, or I/O paths, just execute powermt check, to take appropriate action. For example, if you have manually removed an I/O path, check command will detect a dead path and remove it from the EMC path list.

powermt check
powermt check force

Configure Power Path

powermt config

Save/Resotre Power Path configuration

powermt save      <-- Saves to /etc/powermt.custom
powermt save file=/etc/powermt.21-Aug-2010
powermt load file=/etc/powermt.21-Aug-2010

Request Power Path to recheck I/O Paths

powermt restore dev=all

Change mode of specific HBA to active/standby

powermt set mode=[active|standby] hba=X     <-- X being the HBA number

Delete an I/O path

powermt remove dev=X              <-- X being the vaule in the I/O Paths column
powermt remove dev=hdiskpower0    <-- Will remove all I/O paths to a specific device

SDDPCM Commands

Query device paths

pcmpath query adapter

Remove failed paths

rmpath -p fscsi0 -d

Show adapter WWPN's

pcmpath query wwpn

Query ports

pcmpath query port

Check current and ODM queue depth value

Can use this to check if AIX has been rebooted since changing queue depth

# lsattr -El hdisk6 -a queue_depth
queue_depth 128 Queue DEPTH True          <-- Value in ODM
# echo scsidisk hdisk6 | kdb | grep queue_depth
   ushort queue_depth   = 0x80;           <-- Running config
# echo "ibase=16 ; 80" | bc
128                                       <-- Hex value conversion

Create ramdisk

mkramdisk 2G
mkfs -V jfs2 /dev/ramdisk0
mkdir -p /ramdisk0
mount -V jfs2 -o log=NULL -o dio,rbrw,noatime /dev/ramdisk0 /ramdisk0

Remove file by inode

ls -i
find . -inum <inode>
find . -inum <inode> -exec rm {} \;

Extended Logical Volume (LV) information

getlvcb -AT <lv_name>

Manually unmirror logical volumes

This will remove the logical volume from hdisk0

rmlvcopy hd6 1 hdisk0
rmlvcopy hd5 1 hdisk0

List Filesystems in reverse sort order

Based up mountpoint string length - useful for unmounting a larger number of filesystems with parent mounts

lsvgfs rootvg | awk '{ print length, $0 }' | sort -n -r | cut -d" " -f2-

Sort /etc/filesystems by mountpoint string length

Should prevent parent/child mount conflicts

for FS in $(awk '(!/^\*/) && (/^\//){ print length, $0 }' /etc/filesystems | sort -n | cut -d" " -f2-); do grep -p "^${FS}" /etc/filesystems; done

Show permissions for all directories to a certain path

# dir=/export/nim/images/OpenSSH; while [ "$dir" != "/" ]; do ls -ald $dir; dir=`dirname $dir`; done
drwxr-x---    8 root     system         4096 Sep 01 09:18 /export/nim/images/OpenSSH
drwxrwxr-x   43 root     system         4096 Oct 20 09:45 /export/nim/images
drwxr-xr-x   20 root     system         4096 Jul 27 09:58 /export/nim
drwxrwxr-x    3 root     system          256 Nov 06 2015  /export

rsync delete files in the destination that are no longer in the source

Won't copy new/changed files, this is only to delete. Remove the --dry-run option to actually delete

rsync --recursive -x --delete --ignore-existing --existing --prune-empty-dirs --verbose --dry-run /kristian1/ /kristian2

Use rsync to resume SSH download

rsync --partial --progress -avz -e "ssh -p 22" <user>@<host>:~/IBM/Downloads/AIX/7100-04-00-ISO/*.iso .    <-- Pull
rsync --partial --progress -avz . <user>@<host>:~/aixtoolbox    <-- Push

Move a filesystem or logical volume from one volume group to another

Example below has the /app/IBMucd in rootvg, and we're moving it to kristianvg

1. Verify if the existing filesystem is using internal or external logging

# mount | grep /app/IBMucd
/dev/fslv00      /app/IBMucd      jfs2   Mar 16 09:10 rw,nodev,nosuid,log=INLINE

2. Umount existing filesystem

umount /app/IBMucd

3. Copy existing logical volume to another volume group with a new name

cplv -v kristianvg -y newfslv00 fslv00

4. Change the filesystem to use the new logical volume and log device

If using inline logging

chfs -a dev=/dev/newfslv00 -a log=INLINE /app/IBMucd

If using external logging

chfs -a dev=/dev/newfslv00 -a log=/dev/XXXX /app/IBMucd

Where XXXX is the external log for the existing volume group. If no external log exists, create one with mklv and logform.

5. Run fsck and mount filesystem

fsck -ofull -y /app/IBMucd
mount /app/IBMucd

6. Remove the old logical volume from rootvg

rmlv -f fslv00

JFS2 Internal/External Snapshots

Internal

Internal snapshots are only supported from AIX 6.1 and must be enabled when the filesystem is created (-a isnapshot=yes)

1. Create snapshot
 snapshot -o snapfrom=/km -n kmsnap1
  • snapfrom: Filesystem to snapshot
  • -n: Name of snapshot
2. Query snapshot
# snapshot -q /km
Snapshots for /km
Current  Name         Time
   *     kmsnap1      Wed Feb 10 19:55:18 CST 2010
3. Restore individual files
cd /km/.snapshot/kmsnap1
cp -p <source> <dest>
4. Restore entire filesystem
umount /km
rollback –v -n kmsnap1 /km
5. Remove snapshot
snapshot -d -n kmsnap1 /km

External

1. Create snapshot
snapshot -o snapfrom=/km -o size=128M

Size is dependant by how many changes you will be making. In this instance, /km is 256M, so snap lv is half that size.

2. Query snapshot
# snapshot -q /km
Snapshots for /km
Current  Location      512-blocks        Free Time
   *     /dev/fslv08       262144      261376 Wed Feb 10 18:03:15 CST 2010
3. Increase snapshot image
snapshot -o size=+1 /dev/fslv08
Snapshot /dev/fslv08 size is now 524288 512-blocks.
4. Restore individual files
mkdir /mnt/snapfs
mount -v jfs2 -o snapshot /dev/fslv08 /mnt/snapfs
cp -p /mnt/snapfs/<source> <dest>
5. Restore entire filesystem
umount /km
rollback -v /km /dev/fslv08

Considerations

  • If writes to an internal snapshot fail (out of space), all snapshots are marked as INVALID, error writen to errpt. All snapshots need to be removed and then recreated.
  • Internal snapshots are removed if a fsck is ran against the filesystem.
  • Internal snapshots consume space inside the original filesystem.

Rename a device

To rename disk hdisk5 to hdisk2

rendev -l hdisk5 -n hdisk2

Network

Transfer IP address

ifconfig en2 1.2.3.4 transfer en1
ifconfig en2 down detach
rmdev -Rdl ent2; rmdev -Rdl et2; rmdev -Rdl en2
chdev -l en1 -a netaddr='1.2.3.4' -a netmask='255.255.255.0' -a state=up

iptrace

Start trace

iptrace -a -i en0 -p 25 /tmp/iptrace.`hostname`.out

Stop trace

kill <pid> -l5

Read trace

ipreport -n /tmp/iptrace.`hostname`.out | more

Can also be read with Wireshark

Map a port to a process

# netstat -aAn | grep 22
f10007000028bbb0 tcp4       0      0  *.22               *.*                LISTEN
# rmsock f10007000028bbb0 tcpcb
The socket 0x28b808 is being held by proccess 151996 (sshd).

or

lsof -i :PORT

Map a process to a port

lsof -Pp <PID>

Remove multiple default gateways

# odmget -q "attribute=route" CuAt
    CuAt:
            name = "inet0"
            attribute = "route"
            value = "net,-hopcount,0,,0,192.168.0.2"
            type = "R"
            generic = "DU"
            rep = "s"
            nls_index = 0

    CuAt:
            name = "inet0"
            attribute = "route"
            value = "net,-hopcount,0,,0,192.168.0.2"
            type = "R"
            generic = "DU"
            rep = "s"
            nls_index = 0

If there are more than one, you need to remove the excess route

chdev -l inet0 -a delroute="net,-hopcount,0,,0,192.168.0.2"

Configure Dead Gateway Detection on the default route(DGD)

route change default -active_dgd

Add the command route change default -active_dgd to the /etc/rc.tcpip file to make this change permanent.''

Change the frequency of the DGD pings

no -p -o dgd_ping_time=2

Default is 5 seconds (Lowering it will allow for faster recovery)

List all HBA's and WWPN's

AIX

lsdev -C | awk '/^fcs/{ print $1 }' | while read -r FCS; do echo "${FCS}\t$(lscfg -vl "${FCS}" | awk -F. '/Network Address/{ print $NF }')"; done

For Virtual I/O Servers so you don't include FCoE adapters

lsdev -C | awk '/^fcs/ && /16Gb/{ print $1 }' | while read -r FCS; do echo "${FCS}\t$(lscfg -vl "${FCS}" | awk -F. '/Network Address/{ print $NF }')"; done

The apply command can also be used

apply "lscfg -vl fcs%1" 0 1 2 3 | grep Net

You can format the WWPN's for the SAN team

echo c0507603a292007c | sed 's/../&:/g;s/:$//'

Miscellaneous

Quick HTTP web server using python3

Use --bind 127.0.0.1 if you want to make it local only

python3 -m http.server 8080

Estimate mksysb size

df -tk $(lsvgfs rootvg) | awk '{ total+=$3 } END { printf "Estimated mksysb size %d bytes, %.2f GB\n", total*1024, total/1024/1024 }'

Update adapter firmware without using diag menus

diag -c -d fcsXX -T "download -s /etc/microcode -l latest -f"

vi out of memory

export EXINIT="set ll=20000000"

Read audit file

auditpr -h elRtcrp -vX < /audit/trail.20160113

Use Java to unzip a file

export PATH=$PATH:/usr/java8/bin
jar -xvf zipfile.zip

Find a parent device

odmget -q name=rmt0 CuDv

Convert Gb to 512byte blocks

expr 150 \* 1024 \* 1024 \* 1024 \/ 512

cpio

Extract

cpio -icvdum < /tmp/file.cpio

Read

cpio -ictv < /tmp/file.cpio

Create

cpio -ov > /tmp/file.cpio

Restore file from mksysb backup

read

restore -Tqvf <file.mksysb>

restore

restore -xvqf <file.mksysb>

restore individule directory and it's contents

restore -xdvqf <file.mksysb> ./ibmsupt

restore from mksysb file

restore -xvqf <file.mksysb> ./etc/exports

Read /var/adm/wtmp file

/usr/sbin/acct/fwtmp < /var/adm/wtmp > /test/wtmp.txt

Create an empty file of any size

lmktemp <file> 10M

or

dd if=/dev/zero of=/etl/test bs=1M count=5120    <-- Will create a 5GB test file

Prevent SIGHUP on a process already running

nohup -p <PID>

getconf commands

What was the device the system was last booted from

getconf BOOT_DEVICE

What size is a particular disk in the system

getconf DISK_SIZE /dev/hdisk0

What partition size is being used on a disk in the system

getconf DISK_PARTITION /dev/hdisk0

Is the machine capable of running a 64-bit kernel

getconf HARDWARE_BITMODE

Is the system currently running a 64-bit or 32-bit kernel

getconf KERNEL_BITMODE

How much real memory does the system have

getconf REAL_MEMORY

Set attention LED light to normal from command line

/usr/lpp/diagnostics/bin/usysfault -s normal

Mount an IOS image

loopmount -i cdrom.iso -o "-V cdrfs -o ro" -m /mnt

Use openssl to get MD5 of a file

openssl dgst -md5 dynadock_1_3.iso

Use csum to get MD5/SHA1 of a file

csum -h MD5 MH01706_x86.iso
csum -h SHA1 MH01706_x86.iso

Create ISO image from mksysb

mkcd -L -S -I /export/images/mksysb/2011 -m /export/images/mksysb/2011/MC_MOD.20110117.mksysb

Debug a hung process

dbx -a <hung_pid>
  • thread List thread ID's. Look for threads in an abnormal state, WAIT or DEADLOCK
  • thread current <number> Set attention to thread. Value is the number from $t1, e.g. "thead current 1"
  • x Thead dump, check if you can see where it's hanging
  • where Cab also give you an idea of where it's hung
  • detach Exit out of dbx session. "quit" will exit but also kill the PID.

Show value range for chdev/lsattr paramaters

# lsattr -l hdisk5 -a queue_depth -R
1...32 (+1)

sudo debugging

# touch /var/log/sudo_debug.log
# cat /opt/sysadm/etc/sudo.conf
Debug sudo /var/log/sudo_debug.log [email protected]
Debug sudoers.so /var/log/sudo_debug.log [email protected]

pam debugging

Add a *.debug entry in syslog.conf

touch /etc/pam_debug

Find HMC IP address

AIX 5.3

lsrsrc IBM.ManagementServer

AIX 6.1 or higher

lsrsrc IBM.MCP
lsrsrc IBM.MCP IPAddresses
lsrsrc IBM.MCP HMCIPAddr

Process creation time

# kdb
WARNING: Version mismatch between unix file and command kdb
           START              END <name>
0000000000001000 0000000007140000 start+000FD8
F00000002FF47600 F00000002FFE1000 __ublock+000000
000000002FF22FF4 000000002FF22FF8 environ+000000
000000002FF22FF8 000000002FF22FFC errno+000000
F1001104C0000000 F1001104D0000000 pvproc+000000
F1001104D0000000 F1001104D8000000 pvthread+000000
read vscsi_scsi_ptrs OK, ptr = 0x0
(0)> tpid -d 9044254 | head
                SLOT NAME     STATE    TID PRI   RQ CPUID  CL  WCHAN
pvthread+09A800 2472 pfcdaemo SLEEP 1A80157 03C    0         0  F1000915905EE310
(0)> u 2472 | grep ticks
   start..00000000604FDB75   ticks..0000000000001F04
(0)> hcal 00000000604FDB75
Value hexa: 604FDB75          Value decimal: 1615846261
(0)> quit
# perl -le 'print scalar localtime $ARGV[0]' 1615846261
Tue Mar 16 09:11:01 2021

multibos Error reading LVCB attribute

multibos -R fails, leaving two hd5's in rootvg.

# multibos -R
Initializing multibos methods ...
Initializing log /etc/multibos/logs/op.alog ... 
Gathering system information ...
+-----------------------------------------------------------------------------+ 
Remove Operation 
+-----------------------------------------------------------------------------+ 
Verifying operation parameters ...
+-----------------------------------------------------------------------------+ 
Boot Partition Processing 
+-----------------------------------------------------------------------------+ 
multibos: 0565-080 Error reading LVCB attribute "fs,mb" of logical volume hd5.
multibos: 0565-082 Unable to verify multibos tag for standby BOS logical volume hd5
multibos: 0565-084 Error processing primary boot partition.
multibos: 0565-002 ATTENTION: cleanup did not complete successfully.
Log file is /etc/multibos/logs/op.alog 
Return Status: FAILURE
Leaving two hd5's in rootvg:
# lsvg -l rootvg 
rootvg: 
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT 
hd5 boot 1 2 2 closed/syncd N/A                         <<<<<
hd6 paging 128 256 2 open/syncd N/A 
hd8 jfs2log 1 2 2 open/syncd N/A 
hd4 jfs2 4 8 2 open/syncd / 
hd2 jfs2 54 108 2 open/syncd /usr 
hd9var jfs2 16 32 2 open/syncd /var 
hd3 jfs2 8 16 2 open/syncd /tmp 
hd1 jfs2 28 56 2 open/syncd /home 
hd10opt jfs2 4 8 2 open/syncd /opt 
hd11admin jfs2 1 2 2 open/syncd /admin 
lg_dumplv sysdump 12 12 1 open/syncd N/A 
livedump jfs2 1 2 2 open/syncd /var/adm/ras/livedump 
bos_hd5 boot 1 2 2 closed/syncd N/A                         <<<<

1.Fix the Logical volume control blocks

putlvcb -f 'vfs=jfs2:log=/dev/hd8:mount=automatic:type=bootfs:vol=root:free=true:quota=no' hd4 
putlvcb -f 'vfs=jfs2:log=/dev/hd8:mount=automatic:type=bootfs:vol=/usr:free=false:quota=no' hd2 
putlvcb -f 'vfs=jfs2:log=/dev/hd8:mount=automatic:type=bootfs:vol=/var:free=false:quota=no' hd9var 
putlvcb -f 'vfs=jfs2:log=/dev/hd8:mount=true:check=true:vol=/opt:free=false:quota=no' hd10opt

2. Remove the multibos tags from the existing file systems

chfs -a mb= /opt
chfs -a mb= /var
chfs -a mb= /usr
chfs -a mb= /

3. Remove or move the multibos directory

mv /etc/multibos /tmp

4. Remove the leftover bos_hd5

rmlv -f bos_hd5

5. Remove the /bos_inst directory

rm -R /bos_inst

6. Remove the mbverify entry from iniitab: Backup

cp /etc/inittab /etc/inittab.backup Remove:
rmitab mbverify

7. Recreate the boot image to ensure you have a good copy

bosboot -ad /dev/ipldevice

8. Verify the bootlist ponts to hd5 or the rootvg disk only

bootlist -om normal

Fixing underlying mount point permissions

Example of error

$ ls -al
ls: 0653-345 ./..: Permission denied.

Verify mount point permissions

#!/bin/ksh
#Show Mount Point Permissions

[ `whoami` = "root" ] || { echo "Run as root"; exit 1; }

tmpdir="/tmp/$$"
mkdir "$tmpdir"
for fs in `mount | grep jfs | awk '{print $2}'`; do
        parentmount=`df "/$fs/.." | tail -n 1 | awk '{print $7}'`
        mount -o ro "$parentmount" "$tmpdir"
        printf "%-24s" $fs
        ls -ald `echo $fs | sed "s%$parentmount%$tmpdir/%"`
        umount "$tmpdir"
done
rmdir "$tmpdir"

Fix underlying mount point permissions if you don't want to unmount the filesystem

#!/bin/ksh
#Add read/execute permissions to user/group/others on underlying mount point

fs="$1"

[ `whoami` = "root" ] || { echo "Run as root"; exit 1; }
if [ -z "$fs" ]; then
        echo "Enter Mount Point to change permissions on as argument"
        exit 1
fi

tmpdir="/tmp/$$"
mkdir "$tmpdir"
parentmount=`df "/$fs/.." | tail -n 1 | awk '{print $7}'`
mount "$parentmount" "$tmpdir"
echo "Original Permissions:"
ls -ald `echo $fs | sed "s%$parentmount%$tmpdir/%"`
chmod a+rx `echo $fs | sed "s%$parentmount%$tmpdir/%"`
echo; echo "New Permissions:"
ls -ald `echo $fs | sed "s%$parentmount%$tmpdir/%"`
umount "$tmpdir"
rmdir "$tmpdir"

installp BUILDDATE requisite failure

# lppchk -v
lppchk:  The following filesets need to be installed or corrected to bring
         the system to a consistent state:

  bos.rte.serv_aid 7.1.5.30               (usr: COMMITTED, root: not installed)

# lslpp -h bos.rte.serv_aid
  Fileset         Level     Action       Status       Date         Time
  ----------------------------------------------------------------------------
Path: /usr/lib/objrepos
  bos.rte.serv_aid
                  7.1.1.0   COMMIT       COMPLETE     02/12/13     12:36:21
                 7.1.1.16   COMMIT       COMPLETE     02/12/13     12:51:18
                 7.1.3.45   COMMIT       COMPLETE     08/11/15     18:18:09
                  7.1.4.0   COMMIT       COMPLETE     09/02/16     21:39:09
                  7.1.4.1   COMMIT       COMPLETE     02/22/17     16:30:57
                 7.1.4.30   COMMIT       COMPLETE     11/17/18     10:13:15
                 7.1.4.31   COMMIT       COMPLETE     11/17/18     10:13:17
                  7.1.5.0   COMMIT       COMPLETE     11/17/18     10:13:19
                 7.1.5.15   COMMIT       COMPLETE     08/14/19     17:13:09
                 7.1.5.30   COMMIT       COMPLETE     08/20/19     14:22:15

Path: /etc/objrepos
  bos.rte.serv_aid
                  7.1.1.0   COMMIT       COMPLETE     02/12/13     12:36:21
                 7.1.1.16   COMMIT       COMPLETE     02/12/13     12:51:18
                 7.1.3.45   COMMIT       COMPLETE     08/11/15     18:18:09
                  7.1.4.0   COMMIT       COMPLETE     09/02/16     21:39:09
                  7.1.4.1   COMMIT       COMPLETE     02/22/17     16:30:57
                 7.1.4.30   COMMIT       COMPLETE     11/17/18     10:13:15
                 7.1.4.31   COMMIT       COMPLETE     11/17/18     10:13:17
                  7.1.5.0   COMMIT       COMPLETE     11/17/18     10:13:19
                 7.1.5.15   COMMIT       COMPLETE     08/14/19     17:13:09
                       << -- 7.1.5.30 missing from this list -- >>

The highlighted line shows 7.1.5.30 missing from the list.

To fix, copy the fileset to the host, and then run one of the below installp commands.

# ls -l bos.rte.serv_aid.7.1.5.30.U
-rw-r-----    1 kristijan   staff      980992 Oct 04 2018  bos.rte.serv_aid.7.1.5.30.U
# installp -Or -ac bos.rte.serv_aid    <-- To reinstall the root part
# installp -Ou -ac bos.rte.serv_aid    <-- To reinstall the usr part

Get limits (ulimit) of a running process

# dbx -a 12517682
Waiting to attach to process 12517682 ...
Successfully attached to ovcd.
warning: Directory containing ovcd could not be determined.
Apply 'use' command to initialize source path.

Type 'help' for help.
reading symbolic information ...warning: no source compiled with -g

stopped in _event_sleep at 0x9000000005c5f54 ($t1)
0x9000000005c5f54 (_event_sleep+0x514) e8410028             ld   r2,0x28(r1)
(dbx) proc rlimit
rlimit name:          rlimit_cur               rlimit_max       (units)
 RLIMIT_CPU:         (unlimited)             (unlimited)        sec
 RLIMIT_FSIZE:       (unlimited)             (unlimited)        bytes
 RLIMIT_DATA:          134217728             (unlimited)        bytes
 RLIMIT_STACK:          33554432             (unlimited)        bytes
 RLIMIT_CORE:                  0                       0        bytes
 RLIMIT_RSS:            33554432             (unlimited)        bytes
 RLIMIT_AS:          (unlimited)             (unlimited)        bytes
 RLIMIT_NOFILE:             2000             (unlimited)        descriptors
 RLIMIT_THREADS:          262144             (unlimited)        per process
 RLIMIT_NPROC:            262144             (unlimited)        per user
(dbx) detach

Building an AIX bff package

The mkinstallp command comes as part of the bos.adt.insttools package.

1. Create a build location

mkdir -p /packagename/root

2. Copy package contents

Copy over the package files into the base directory using the absolute location.

File/folder locations

If your file needs to be located in /app/package/file.txt then copy it into /packagename/root/app/package/file.txt. Set the folder/file permissions as required.

mkdir -p /packagename/root/app/package
cp -rp file.txt /packagename/root/app/package/file.txt

3. Create a package template file

The basic template below is enough to get a package built. You can find a complete list of options in /usr/lpp/bos/README.MKINSTALLP.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Package Name: PackageName
Package VRMF: 1.0.0.0
Update: N
Fileset
  Fileset Name: PackageName.rte
  Fileset VRMF: 1.0.0.0
  Fileset Description: Package description
  USRLIBLPPFiles
   Pre-installation Script: /packagename/pre_i
   Post-installation Script: /packagename/post_i
   Unconfiguration Script: /packagename/unconfig
  EOUSRLIBLPPFiles
  Bosboot required: N
  License agreement acceptance required: N
  Include license files in this package: N
  Requisites:
  USRFiles
   /app/package/file.txt
  EOUSRFiles
  ROOT Part: N
  ROOTFiles
  EOROOTFiles
  Relocatable: N
EOFileset

The following lines are most commonly changed:

  • 1 & 5 are the package and fileset name
  • 2 & 6 are the package and fileset version
  • 7 is the package description
  • 9-11 are the scripts used during installation/uninstallation
  • 18 is a list of all the files that make up the package

4. Build the package

mkinstallp -d /packagename/root -T /packagename/root/TemplateFile

The built package will be located in /packagename/root/tmp

NIM

Configure SSL communication between master and client

niminit -v -a master=<nim_hostname> -a name=$(hostname) -a connect=nimsh
/usr/sbin/nimclient -c

Verify you can pull down SSL cert using tftp

tftp -g - <nim_hostname> /tftpboot/server.pem

Oracle

Check Oracle ASM LUN member/candidate ownership

/app/oragrid/product/*/inventory/Scripts/ext/bin/kfod verbose=true, disks=all status=true op=disks asm_diskstring='/dev/rhdisk*'
/app/oragrid/product/*/inventory/Scripts/ext/bin/kfod verbose=true, disks=all status=true op=disks asm_diskstring='/dev/rhdisk*' | egrep -i member

Start/Stop Oracle RAC (when performing LPM)

/app/oragrid/product/11.2.0.3/bin/crsctl start has
/app/oragrid/product/11.2.0.3/bin/crsctl stop has [-f]

Check Oracle RAC/cluster status

/app/oragrid/product/*/bin/crsctl stat res -t
/app/oragrid/product/*/bin/crsctl check cluster -all

Scripts

deactivate_paging.sh

If paging space logical volumes are all the same size, AIX will round robin between them. If they're at different sizes, the smallest will be used first, and then the next one, and so on...

If additional paging space logical volumes have been added, they're likely larger than the default logical volume of hd6. In this case, at boot, we can deactivate it in favour of the larger one(s).

# Script      : deactivate_paging.sh
#
# Description : Script runs at boot (inittab), and will deactivate the default
#               AIX paging device (hd6) if an alternate paging space is
#               active, and greater in size.
#
# Usage       : Script takes no parameters.

if lsps -ac | egrep -v '^#|^hd6' > /dev/null 2>&1; then
  # Get size of default paging space
  hd6_size=`lsps -a | grep '^hd6' | awk '{print $4}'`

  # Create array of other paging space attributes
  set -A paging_name `lsps -a | grep -v '^hd6' | awk '(NR!=1) {print $1}'`
  set -A paging_size `lsps -a | grep -v '^hd6' | awk '(NR!=1) {print $4}'`
  set -A paging_active `lsps -a | grep -v '^hd6' | awk '(NR!=1) {print $6}'`

  # Default paging space (hd6) will be turned off if any single
  # alternate paging space is active, and greater in size than
  # the default paging space
  count=0
  while (( $count < ${#paging_name[*]} )); do
    if [[ ( ${paging_active[$count]} = "yes" ) && ( ${paging_size[$count]%??} -gt ${hd6_size%??} ) ]]; then
      echo "At least one alternate paging space detected and active [${paging_name[$count]}]" > /dev/console
      echo 'Deactivating default paging space hd6...' > /dev/console
      swapoff /dev/hd6 > /dev/console
      exit
    else
      let count="count + 1"
    fi
  done
fi

extenddump.sh

Script runs from roots cron and checks the current dump device size against the dump size estimate, and increases the dump device if smaller. The prevent the dump device taking up all the space in rootvg, it's capped at 32GB.

# Script      : extenddump.sh
#
# Description : Script runs from roots cron and checks the current dump device size
#               against the dump size estimate and increases the dump device if smaller.
#
#               The prevent the dump device taking up all the space in rootvg, it's capped at
#               32GB.
#
# Usage       : Script takes no parameters.

# Current dump devices
PRI_DMP=$(sysdumpdev -l | awk /^primary/'{ sub("/dev/","",$2); print $2 }')
SEC_DMP=$(sysdumpdev -l | awk /^secondary/'{ sub("/dev/","",$2); print $2 }')

# Estimate size of dump
EST_DMPSIZE_BYTES=$(sysdumpdev -e | awk -F: '{ gsub(" ", "", $2); print $2 }')

# Primary dump increase
if [ "${PRI_DMP}" != "sysdumpnull" ]; then
    PRI_VG=$(lslv "${PRI_DMP}" | awk -F : '/VOLUME GROUP/{ gsub(" ", "", $3); print $3 }')
    PRI_VGPPSIZE=$(lsvg "${PRI_VG}" | awk -F'[^0-9]*' '/PP SIZE/{ print $2 }')

    PRI_EST_DMPSIZE_PP=$((("${EST_DMPSIZE_BYTES}" / 1024 / 1024 / "${PRI_VGPPSIZE}") + 1))
    PRI_CUR_DMPSIZE_PP=$(getlvcb -AT "${PRI_DMP}" | awk -F= /"number lps"/'{ gsub(" ", "", $2); print $2 }')

    # Check if the increase will extend the dump lv size beyond 32GB
    EXTEND_LV_PP=$(("${PRI_EST_DMPSIZE_PP}" - "${PRI_CUR_DMPSIZE_PP}"))
    if [ $((("${PRI_CUR_DMPSIZE_PP}" + "${EXTEND_LV_PP}") * "${PRI_VGPPSIZE}")) -le 32768 ]; then
        # Check if the dump lv is already large enough to accommodate the
        # estimated dump size in PP's
        if [ "${EXTEND_LV_PP}" -gt 0 ]; then
            if extendlv "${PRI_DMP}" "${EXTEND_LV_PP}"; then
                echo "$(date) - Dump LV: $PRI_DMP extended by $EXTEND_LV_PP PP's successfully."
            else
                echo "$(date) - Dump LV: $PRI_DMP extended by $EXTEND_LV_PP PP's failed."
            fi
        fi
    else
        echo "$(date) - Dump LV: $PRI_DMP extend failed, as it would extend beynd the 32GB limit."
    fi
fi

# Secondary dump, if it exists, should be the same size as the primary
if [ "${SEC_DMP}" != "sysdumpnull" ]; then
    # If for some reason the primary and secondary dump lv's are in different volume
    # groups, they might have a different volume group PP size. Let's convert the lv
    # sizes into MB, and work out the PP value from there.
    # - Primary in MB
    PRI_CUR_DMPSIZE_PP=$(getlvcb -AT "${PRI_DMP}" | awk -F= /"number lps"/'{ gsub(" ", "", $2); print $2 }')
    PRI_CUR_DMPSIZE_MB=$(("${PRI_CUR_DMPSIZE_PP}" * "${PRI_VGPPSIZE}"))
    # - Secondary in MB
    SEC_VG=$(lslv "${SEC_DMP}" | awk -F : '/VOLUME GROUP/{ gsub(" ", "", $3); print $3 }')
    SEC_VGPPSIZE=$(lsvg "${SEC_VG}" | awk -F'[^0-9]*' '/PP SIZE/{ print $2 }')
    SEC_CUR_DMPSIZE_PP=$(getlvcb -AT "${SEC_DMP}" | awk -F= /"number lps"/'{ gsub(" ", "", $2); print $2 }')
    SEC_CUR_DMPSIZE_MB=$(("${SEC_CUR_DMPSIZE_PP}" * "${SEC_VGPPSIZE}"))

    # Check if the secondary dump lv is smaller than the primary dump lv
    if [ "${SEC_CUR_DMPSIZE_MB}" -lt "${PRI_CUR_DMPSIZE_MB}" ]; then
        EXTEND_LV_PP=$((("${PRI_CUR_DMPSIZE_MB}" - "${SEC_CUR_DMPSIZE_MB}") / "${SEC_VGPPSIZE}"))
        if [ "${EXTEND_LV_PP}" -gt 0 ]; then
            if extendlv "${SEC_DMP}" "${EXTEND_LV_PP}"; then
                echo "$(date) - Dump LV: $SEC_DMP extended by $EXTEND_LV_PP PP's successfully."
            else
                echo "$(date) - Dump LV: $SEC_DMP extended by $EXTEND_LV_PP PP's failed."
            fi
        fi
    fi
fi

mksysb_check.py

Checks that there is a client mksysb resource on the NIM master, and checks that the creation date of the resource isn't older than 15 days.

Exit codes are used to determine the status.

Exit code Description
0 mksysb found and is not older than 15 days
1 no mksysb found
2 all mksysbs found are older than 15 days
#!/opt/freeware/bin/python3
#
# Check that there is at least one mksysb for the client, and
# that the creation date of the mksysb is not older than 15 days

import sys
import socket
import subprocess
from datetime import datetime

# Get hostname
hostname = socket.gethostname()

# Get current time
current_time = datetime.now()

# Create list of mksysb resources from the NIM server
nim_mksysb_list = subprocess.check_output(f"/usr/sbin/nimclient -l -L -t mksysb {hostname} | /usr/bin/awk '/{hostname}/{{ print $1 }}'", shell=True, encoding='utf-8').split()
# If the subprocess above returns no values, the result is a
# single item list with an empty string. Let's strip that out.
nim_mksysb_list = filter(None, nim_mksysb_list)

# Parse list of NIM mksysb backups and compare creation date
if nim_mksysb_list:
    for mksysb in nim_mksysb_list:
        mksysb_creation_time = subprocess.check_output(f"/usr/sbin/nimclient -l -l {mksysb} | /usr/bin/awk -F = '/creation_date/{{ print $2 }}'", shell=True, encoding='utf-8').strip()
        mksysb_creation_time = datetime.strptime(''.join(mksysb_creation_time), '%c')
        elapsed = current_time - mksysb_creation_time
        if elapsed.days < 15:
            sys.exit(0)
else:
    # List is empty, no backups on NIM.
    sys.exit(1)

# If we've made it this far, all mksysb's are older than 15 days
sys.exit(2)