GPFS¶
Create cluster¶
The following notes detail building a GPFS cluster using SSH with sudo wrappers.
1. Create local account¶
Create a local account that will be used/configured with sudo rules to allow GPFS cluster administration. This example will use the account name gpfsadmin
.
There are a list of rules required for GPFS administration (listed in the IBM documentation). I've added a number of "list/query" type commands that I think are resonable to add without being prompted for the users password.
Defaults secure_path="/bin:/usr/lpp/mmfs/bin:/usr/lpp/mmfs/lib/gsk8/bin"
Defaults env_keep += "MMMODE environmentType GPFS_rshPath GPFS_rcpPath mmScriptTrace GPFSCMDPORTRANGE GPFS_CIM_MSG_FORMAT"
User_Alias GPFS_USER = gpfsadmin
Cmnd_Alias GPFS_NOPASSWD_COMMANDS = /usr/lpp/mmfs/bin/mmremote,\
/usr/lpp/mmfs/bin/mmsdrrestore,\
/usr/lpp/mmfs/bin/mmbackupconfig,\
/usr/lpp/mmfs/bin/mmgetstate,\
/usr/lpp/mmfs/bin/mmdf,\
/usr/lpp/mmfs/bin/mmlsattr,\
/usr/lpp/mmfs/bin/mmlscallback,\
/usr/lpp/mmfs/bin/mmlscluster,\
/usr/lpp/mmfs/bin/mmlsconfig,\
/usr/lpp/mmfs/bin/mmlsdisk,\
/usr/lpp/mmfs/bin/mmlsfileset,\
/usr/lpp/mmfs/bin/mmlsfs,\
/usr/lpp/mmfs/bin/mmlslicense,\
/usr/lpp/mmfs/bin/mmlsmgr,\
/usr/lpp/mmfs/bin/mmlsmount,\
/usr/lpp/mmfs/bin/mmlsnode,\
/usr/lpp/mmfs/bin/mmlsnodeclass,\
/usr/lpp/mmfs/bin/mmlsnsd,\
/usr/lpp/mmfs/bin/mmlspolicy,\
/usr/lpp/mmfs/bin/mmlspool,\
/usr/lpp/mmfs/bin/mmlspv,\
/usr/lpp/mmfs/bin/mmlsqos,\
/usr/lpp/mmfs/bin/mmlsquota,\
/usr/lpp/mmfs/bin/mmlssnapshot,\
/usr/lpp/mmfs/bin/mmhealth,\
/usr/lpp/mmfs/bin/mmfsadm,\
/usr/lpp/mmfs/bin/gpfs.snap,\
/usr/bin/scp,\
/bin/echo
Cmnd_Alias GPFS_PASSWD_COMMANDS = /usr/lpp/mmfs/bin/*,\
/usr/lpp/mmfs/lib/gsk8/bin/*
GPFS_USER ALL=(root) PASSWD: GPFS_PASSWD_COMMANDS, NOPASSWD: GPFS_NOPASSWD_COMMANDS
2. Configure /etc/hosts¶
(On all nodes) Update /etc/hosts
with all the cluster members and their GPFS network IP addresses (if production)
# cat /etc/hosts
hostent -a 1.2.3.4 -h "gpfsnode1"
hostent -a 1.2.3.5 -h "gpfsnode2"
hostent -a 1.2.3.6 -h "gpfsnode3"
3. SSH keys¶
(On all nodes) Create SSH keys for root and place them into gpfsadmin's authorized_keys
file
All GPFS cluster member root public keys need to be in the authorized_keys
file.
Keyscan the hosts.
/usr/bin/ssh-keyscan -4 -p 22 gpfsnode1 >> /.ssh/known_hosts
/usr/bin/ssh-keyscan -4 -p 22 gpfsnode2 >> /.ssh/known_hosts
/usr/bin/ssh-keyscan -4 -p 22 gpfsnode3 >> /.ssh/known_hosts
4. Create cluster¶
The commands from here on out will assume that they're being run as the gpfsadmin user.
(On a single node) Create the GPFS cluster
# su - gpfsadmin
$ sudo mmcrcluster -N gpfsnode1:quorum,gpfsnode2:quorum,gpfsnode3:quorum --ccr-enable -r /usr/lpp/mmfs/bin/sshwrap -R /usr/lpp/mmfs/bin/scpwrap -C GPFSnonprod -A
Designate and accept licenses
$ sudo mmchlicense server --accept -N all
Start and list cluster
$ sudo mmstartup -a
$ sudo mmgetstate -aL
Node number Node name Quorum Nodes up Total nodes GPFS state Remarks
------------------------------------------------------------------------------------
1 gpfsnode1 2 3 3 active quorum node
2 gpfsnode2 2 3 3 active quorum node
3 gpfsnode3 2 3 3 active quorum node
$ sudo mmlscluster
GPFS cluster information
========================
GPFS cluster name: GPFSnonprod.gpfsnode1
GPFS cluster id: 14146430395377753931
GPFS UID domain: GPFSnonprod.gpfsnode1
Remote shell command: sudo wrapper in use
Remote file copy command: sudo wrapper in use
Repository type: CCR
Node Daemon node name IP address Admin node name Designation
--------------------------------------------------------------------
1 gpfsnode1 1.2.3.4 gpfsnode1 quorum
2 gpfsnode2 1.2.3.5 gpfsnode2 quorum
3 gpfsnode3 1.2.3.6 gpfsnode3 quorum
5. Configure cluster to start on boot¶
sudo mmchconfig autoload=yes
6. Create NSD files¶
$ cat gpfs_nsd
%nsd:
device=hdisk1
nsd=01_gpfs
servers=gpfsnode1,gpfsnode2,gpfsnode3
usage=dataAndMetadata
%nsd:
device=hdisk2
nsd=02_gpfs
servers=gpfsnode1,gpfsnode2,gpfsnode3
usage=dataAndMetadata
%nsd:
device=hdisk3
nsd=03_gpfs
servers=gpfsnode1,gpfsnode2,gpfsnode3
usage=dataAndMetadata
7. Create NSD's¶
$ sudo mmcrnsd -F /home/gpfsadmin/gpfs_nsd -v no
$ sudo mmlsnsd
File system Disk name NSD servers
---------------------------------------------------------------------------
(free disk) 01_gpfs gpfsnode1,gpfsnode2,gpfsnode3
(free disk) 02_gpfs gpfsnode1,gpfsnode2,gpfsnode3
(free disk) 03_gpfs gpfsnode1,gpfsnode2,gpfsnode3
8. Create a tie-breaker disk¶
$ sudo mmchconfig tiebreakerDisks="01_gpfs"
$ sudo mmlsconfig
Configuration data for cluster GPFSnonprod.gpfsnode1:
----------------------------------------------------
clusterName GPFSnonprod.gpfsnode1
clusterId 14146430395377753931
dmapiFileHandleSize 32
minReleaseLevel 5.0.2.0
ccrEnabled yes
cipherList AUTHONLY
autoload yes
tiebreakerDisks 01_gpfs
adminMode central
File systems in cluster GPFSnonprod.gpfsnode1:
---------------------------------------------
(none)
9. Create shared filesystem¶
Verify all the flags being used in the mmcrfs
command below. You may want to modify them depending on the filesystem requirements.
$ sudo mmcrfs /dev/gpfs_fs01 -F /home/gpfsadmin/gpfs_nsd -T /gpfs_fs01 -v no -M 2 -R 2 -E no -S yes
GPFS: 6027-531 The following disks of gpfs_fs01 will be formatted on node gpfsnode1:
01_gpfs: size 51200 MB
02_gpfs: size 51200 MB
03_gpfs: size 51200 MB
GPFS: 6027-540 Formatting file system ...
GPFS: 6027-535 Disks up to size 776.99 GB can be added to storage pool system.
Creating Inode File
Creating Allocation Maps
Creating Log Files
Clearing Inode Allocation Map
Clearing Block Allocation Map
Formatting Allocation Map for storage pool system
GPFS: 6027-572 Completed creation of file system /dev/gpfs_fs01.
mmcrfs: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
$ sudo mmlsnsd
File system Disk name NSD servers
---------------------------------------------------------------------------
gpfs_fs01 01_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs01 02_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs01 03_gpfs gpfsnode1,gpfsnode2,gpfsnode3
Mount/Umount shared filesystem¶
sudo mmmount gpfs_fs01 -N all
To unmount: sudo mmumount gpfs_fs01 -a
$ sudo mmdf gpfs_fs01 --block-size auto
disk disk size failure holds holds free free
name group metadata data in full blocks in fragments
--------------- ------------- -------- -------- ----- -------------------- -------------------
Disks in storage pool: system (Maximum disk size allowed is 776.99 GB)
01_gpfs 50G -1 yes yes 49.36G ( 99%) 11.12M ( 0%)
02_gpfs 50G -1 yes yes 49.37G ( 99%) 10.87M ( 0%)
03_gpfs 50G -1 yes yes 49.41G ( 99%) 10.87M ( 0%)
------------- -------------------- -------------------
(pool total) 150G 148.1G ( 99%) 32.85M ( 0%)
============= ==================== ===================
(total) 150G 148.1G ( 99%) 32.85M ( 0%)
Inode Information
-----------------
Number of used inodes: 4038
Number of free inodes: 152634
Number of allocated inodes: 156672
Maximum number of inodes: 156672
$ sudo mmlsmount all -L
File system gpfs_fs01 is mounted on 3 nodes:
1.2.3.4 gpfsnode1
1.2.3.5 gpfsnode2
1.2.3.6 gpfsnode3
10. GPFS callbacks¶
These are some really basic GPFS callback scripts that are triggered by certain cluster events. Copies of the scripts are further down in this page.
$ sudo mmaddcallback nodeDown --command /gpfsadmin/callbacks/nodeLeave.sh --event nodeLeave --parms %eventNode --parms %quorumNodes
$ sudo mmaddcallback nodeJoin --command /gpfsadmin/callbacks/nodeJoin.sh --event nodeJoin --parms %eventNode --parms %quorumNodes
$ sudo mmaddcallback quorumLoss --command /gpfsadmin/callbacks/quorumLoss.sh --event quorumLoss --parms %eventNode --parms %quorumNodes
$ sudo mmlscallback
nodeDown
command = /gpfsadmin/callbacks/nodeLeave.sh
event = nodeLeave
parms = %eventNode %quorumNodes
nodeJoin
command = /gpfsadmin/callbacks/nodeJoin.sh
event = nodeJoin
parms = %eventNode %quorumNodes
quorumLoss
command = /gpfsadmin/callbacks/quorumLoss.sh
event = quorumLoss
parms = %eventNode %quorumNodes
Testing nodeLeave callback
$ sudo mmshutdown -N gpfsnode1
$ cat /gpfsadmin/logs/nodeLeave.log
[nodeLeave.sh 01/06/18-17:35:07] - GPFS node leave event at: Fri Jun 1 17:35:07 AEST 2018
[nodeLeave.sh 01/06/18-17:35:07] - The event occurred on node: gpfsnode1
[nodeLeave.sh 01/06/18-17:35:07] - The quorum nodes are: gpfsnode2,gpfsnode3
$ grep "GPFS node leave" /var/adm/syslog/syslog.log
Jun 1 17:35:07 gpfsnode1 user:notice gpfsadmin: GPFS node leave event - gpfsnode1
Adding another node to the cluster¶
$ sudo mmaddnode -N <node>:quorum
$ sudo mmchlicense server --accept -N <node>
$ sudo mmgetstate -aL
$ sudo mmstartup -N <node>
# rm /var/mmfs/gen/nsdpvol
$ sudo mmlspv
Increasing filesystem size¶
1. Map new LUN to all GPFS nodes¶
2. Find existing disk names and node mappings¶
$ sudo mmlsnsd
Password:
File system Disk name NSD servers
---------------------------------------------------------------------------
gpfs_fs01 01_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs01 02_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs01 03_gpfs gpfsnode1,gpfsnode2,gpfsnode3
$ sudo mmlsnsd -m
Password:
Disk name NSD volume ID Device Node name Remarks
---------------------------------------------------------------------------------------
01_gpfs 0A2C103258BF668E /dev/hdisk1 gpfsnode1 server node
01_gpfs 0A2C103258BF668E /dev/hdisk1 gpfsnode2 server node
01_gpfs 0A2C103258BF668E /dev/hdisk1 gpfsnode3 server node
02_gpfs 0A2C103258BF6696 /dev/hdisk2 gpfsnode1 server node
02_gpfs 0A2C103258BF6696 /dev/hdisk2 gpfsnode2 server node
02_gpfs 0A2C103258BF6696 /dev/hdisk2 gpfsnode3 server node
03_gpfs 0A2C103258BF669E /dev/hdisk3 gpfsnode1 server node
03_gpfs 0A2C103258BF669E /dev/hdisk3 gpfsnode2 server node
03_gpfs 0A2C103258BF669E /dev/hdisk3 gpfsnode3 server node
3. Create the NSD file¶
$ cat add_nsd
%nsd:
device=hdisk4
nsd=04_gpfs
servers=gpfsnode1,gpfsnode2,gpfsnode3
usage=dataAndMetadata
4. Create the NSD¶
After running command, wait a few minutes to allow it to propagate through the cluster nodes
$ sudo mmcrnsd -F /home/gpfsadmin/add_nsd -v no
Password:
mmcrnsd: Processing disk hdisk4
mmcrnsd: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
5. Verify new NSD¶
$ sudo mmlsnsd
Password:
File system Disk name NSD servers
---------------------------------------------------------------------------
gpfs_fs01 01_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs01 02_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs01 03_gpfs gpfsnode1,gpfsnode2,gpfsnode3
(free disk) 04_gpfs gpfsnode1,gpfsnode2,gpfsnode3
6. Add NSD to existing filesystem¶
$ sudo mmadddisk gpfs_fs01 -F /home/gpfsadmin/add_nsd
Password:
GPFS: 6027-531 The following disks of gpfs_fs01 will be formatted on node gpfsnode1:
04_gpfs: size 51200 MB
Extending Allocation Map
Checking Allocation Map for storage pool system
GPFS: 6027-1503 Completed adding disks to file system gpfs_fs01.
mmadddisk: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
7. Verify increase filesystem¶
$ sudo mmdf gpfs_fs01 --block-size auto
Password:
disk disk size failure holds holds free free
name group metadata data in full blocks in fragments
--------------- ------------- -------- -------- ----- -------------------- -------------------
Disks in storage pool: system (Maximum disk size allowed is 784 GB)
01_gpfs 50G -1 yes yes 49.69G ( 99%) 2.594M ( 0%)
02_gpfs 50G -1 yes yes 49.69G ( 99%) 1.844M ( 0%)
03_gpfs 50G -1 yes yes 49.69G ( 99%) 2.812M ( 0%)
04_gpfs 50G -1 yes yes 49.94G (100%) 1.844M ( 0%)
------------- -------------------- -------------------
(pool total) 200G 199G (100%) 9.094M ( 0%)
============= ==================== ===================
(total) 200G 199G (100%) 9.094M ( 0%)
Inode Information
-----------------
Number of used inodes: 4038
Number of free inodes: 150330
Number of allocated inodes: 154368
Maximum number of inodes: 154368
Migrate GPFS NSD's¶
These steps can be used if you're needing to migrate from one hdisk to another (for example, if you're migrating for one disk subsystem to another). The below steps detail moving a single filesystem: gpfs_fs01
1. List current filesystem NSD's¶
$ sudo mmlsnsd
File system Disk name NSD servers
------------------------------------------------------------------------------
gpfs_fs01 01_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs01 02_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs01 03_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs02 04_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs02 05_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs02 06_gpfs gpfsnode1,gpfsnode2,gpfsnode3
$ sudo mmlsdisk gpfs_fs01
disk driver sector failure holds holds storage
name type size group metadata data status availability pool
------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------
01_gpfs nsd 512 -1 yes yes ready up system
02_gpfs nsd 512 -1 yes yes ready up system
03_gpfs nsd 512 -1 yes yes ready up system
2. Create NSD configuration file¶
$ cat gpfs_migration_gpfs_fs01
%nsd:
device=hdisk8
nsd=08_gpfs
servers=gpfsnode1,gpfsnode2,gpfsnode3
usage=dataAndMetadata
%nsd:
device=hdisk9
nsd=09_gpfs
servers=gpfsnode1,gpfsnode2,gpfsnode3
usage=dataAndMetadata
%nsd:
device=hdisk10
nsd=10_gpfs
servers=gpfsnode1,gpfsnode2,gpfsnode3
usage=dataAndMetadata
3. Create NSD's¶
$ sudo mmcrnsd -F gpfs_migration_gpfs_fs01 -v yes
Password:
mmcrnsd: Processing disk hdisk8
mmcrnsd: Processing disk hdisk9
mmcrnsd: Processing disk hdisk10
mmcrnsd: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
$ sudo mmlsnsd
File system Disk name NSD servers
------------------------------------------------------------------------------
gpfs_fs01 01_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs01 02_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs01 03_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs02 04_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs02 05_gpfs gpfsnode1,gpfsnode2,gpfsnode3
gpfs_fs02 06_gpfs gpfsnode1,gpfsnode2,gpfsnode3
(free disk) 08_gpfs gpfsnode1,gpfsnode2,gpfsnode3
(free disk) 09_gpfs gpfsnode1,gpfsnode2,gpfsnode3
(free disk) 10_gpfs gpfsnode1,gpfsnode2,gpfsnode3
4. Add new NSD's to filesystem¶
$ sudo mmadddisk gpfs_fs01 -F gpfs_migration_gpfs_fs01 -v yes
Password:
GPFS: 6027-531 The following disks of gpfs_fs01 will be formatted on node gpfsnode1:
08_gpfs: size 51200 MB
09_gpfs: size 51200 MB
10_gpfs: size 51200 MB
Extending Allocation Map
Checking Allocation Map for storage pool system
GPFS: 6027-1503 Completed adding disks to file system gpfs_fs01.
mmadddisk: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
5. Migrate the data¶
Deleting a disk using the mmdeldisk
command will migrate any data from it, to other disks that make up the filesystem.
$ sudo mmdeldisk gpfs_fs01 "01_gpfs;02_gpfs;03_gpfs"
Password:
Deleting disks ...
GPFS: 6027-589 Scanning file system metadata, phase 1 ...
100 % complete on Tue Feb 23 16:06:23 2021
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 2 ...
100 % complete on Tue Feb 23 16:06:23 2021
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 3 ...
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 4 ...
100 % complete on Tue Feb 23 16:06:23 2021
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 5 ...
100 % complete on Tue Feb 23 16:06:23 2021
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-565 Scanning user file metadata ...
100.00 % complete on Tue Feb 23 16:06:24 2021 ( 156672 inodes with total 1841 MB data processed)
GPFS: 6027-552 Scan completed successfully.
Checking Allocation Map for storage pool system
GPFS: 6027-370 tsdeldisk completed.
mmdeldisk: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
$ sudo mmlsnsd -F
File system Disk name NSD servers
------------------------------------------------------------------------------
(free disk) 01_gpfs gpfsnode1,gpfsnode2,gpfsnode3
(free disk) 02_gpfs gpfsnode1,gpfsnode2,gpfsnode3
(free disk) 03_gpfs gpfsnode1,gpfsnode2,gpfsnode3
$ sudo mmlsdisk gpfs_fs01 -m
Disk name IO performed on node Device Availability
------------ ----------------------- ----------------- ------------
08_gpfs localhost /dev/hdisk8 up
09_gpfs localhost /dev/hdisk9 up
10_gpfs localhost /dev/hdisk10 up
6. Restripe and rebalance the filesystem¶
$ sudo mmrestripefs gpfs_fs01 -r
Password:
GPFS: 6027-589 Scanning file system metadata, phase 1 ...
100 % complete on Tue Feb 23 16:20:04 2021
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 2 ...
100 % complete on Tue Feb 23 16:20:04 2021
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 3 ...
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 4 ...
100 % complete on Tue Feb 23 16:20:04 2021
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 5 ...
100 % complete on Tue Feb 23 16:20:04 2021
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-565 Scanning user file metadata ...
100.00 % complete on Tue Feb 23 16:20:04 2021 ( 156672 inodes with total 1824 MB data processed)
GPFS: 6027-552 Scan completed successfully.
$ sudo mmrestripefs gpfs_fs01 -b
Password:
GPFS: 6027-589 Scanning file system metadata, phase 1 ...
100 % complete on Tue Feb 23 16:20:27 2021
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 2 ...
100 % complete on Tue Feb 23 16:20:27 2021
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 3 ...
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 4 ...
100 % complete on Tue Feb 23 16:20:27 2021
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 5 ...
100 % complete on Tue Feb 23 16:20:27 2021
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-565 Scanning user file metadata ...
100.00 % complete on Tue Feb 23 16:20:27 2021 ( 156672 inodes with total 1841 MB data processed)
GPFS: 6027-552 Scan completed successfully.
7. Reconfigure the tie breaker disk¶
If one of the disks that you deleted was the tie breaker disk for the cluster, you need to configure a new one.
$ sudo mmchconfig tiebreakerDisks=08_gpfs
Password:
mmchconfig: Command successfully completed
mmchconfig: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
8. Delete the NSD's¶
$ sudo mmdelnsd "01_gpfs;02_gpfs;03_gpfs"
Password:
mmdelnsd: Processing disk 01_gpfs
mmdelnsd: Processing disk 02_gpfs
mmdelnsd: Processing disk 03_gpfs
mmdelnsd: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
Scripts¶
nodeJoin.sh¶
# Name: nodeJoin.sh
#
# Description: Script called by GPFS when a nodeJoin event is triggered.
# An entry is logged via syslog and into a logfile.
# Variables
LOG=/gpfsadmin/logs/nodeJoin.log
PROG=`basename $0`
# Main
echo "[${PROG} `date +"%d/%m/%y-%T"`] - GPFS node join event at: `date`" >> ${LOG}
echo "[${PROG} `date +"%d/%m/%y-%T"`] - The event occurred on node: $1" >> ${LOG}
echo "[${PROG} `date +"%d/%m/%y-%T"`] - The quorum nodes are: $2" >> ${LOG}
logger "GPFS node join event - $1"
exit
nodeLeave.sh¶
# Name: nodeLeave.sh
#
# Description: Script called by GPFS when a nodeLeave event is triggered.
# An entry is logged via syslog and into a logfile.
# Variables
LOG=/gpfsadmin/logs/nodeLeave.log
PROG=`basename $0`
# Main
echo "[${PROG} `date +"%d/%m/%y-%T"`] - GPFS node leave event at: `date`" >> ${LOG}
echo "[${PROG} `date +"%d/%m/%y-%T"`] - The event occurred on node: $1" >> ${LOG}
echo "[${PROG} `date +"%d/%m/%y-%T"`] - The quorum nodes are: $2" >> ${LOG}
logger "GPFS node leave event - $1"
exit
quorumLoss.sh¶
# Name: quorumLoss.sh
#
# Description: Script called by GPFS when a quorumLoss event is triggered.
# An entry is logged via syslog and into a logfile.
# Variables
LOG=/gpfsadmin/logs/quorumLoss.log
PROG=`basename $0`
# Main
echo "[${PROG} `date +"%d/%m/%y-%T"`] - GPFS quorum loss event at: `date`" >> ${LOG}
echo "[${PROG} `date +"%d/%m/%y-%T"`] - The event occurred on node: $1" >> ${LOG}
echo "[${PROG} `date +"%d/%m/%y-%T"`] - The quorum nodes are: $2" >> ${LOG}
logger "GPFS quorum loss event - $1"
exit
Miscellaneous¶
Verify network communication¶
sudo mmnetverify -v
Check for GPFS contention¶
This is OK
Below is OK if they clear. Waiting fractions of a second isn't an indicator of a problem.
$ sudo mmlsnode -N waiters -L
gpfsnode1: Waiting 0.0021 sec since 14:44:43, monitored, thread 14221389 PrefetchWorkerThread: for I/O completion on disk hdisk15
gpfsnode1: Waiting 0.0030 sec since 14:51:18, monitored, thread 34799655 CreateHandlerThread: on ThCond 0x121BE1458 (MsgRecordCondvar), reason 'RPC wait' for tmMsgBRRevoke
This is an issue
Problem is when they hang around, like below.
$ sudo mmlsnode -N waiters -L
gpfsnode1: Waiting 3409.7817 sec since 18:11:25, monitored, thread 110428751 FullScanDeletionThread: on ThCond 0x118AA91F8 (MsgRecordCondvar), reason 'RPC wait' for tmMsgRevoke on node 1.2.3.4 <c0n16>
gpfsnode1: Waiting 3409.7524 sec since 18:11:25, ignored, thread 97059185 SGAsyncRecoveryThread: on ThCond 0x134B7B298 (MultiThreadWorkInstanceCond), reason 'waiting for helper threads'
gpfsnode1: Waiting 3409.5919 sec since 18:11:25, monitored, thread 50397527 FullScanDeletionThread: on ThCond 0x118F0DB38 (MsgRecordCondvar), reason 'RPC wait' for tmMsgRevoke on node 1.2.3.5 <c0n4>
gpfsnode1: Waiting 3409.5916 sec since 18:11:25, ignored, thread 71041345 SGAsyncRecoveryThread: on ThCond 0x12342E298 (MultiThreadWorkInstanceCond), reason 'waiting for helper threads'
gpfsnode2: Waiting 2463626.6521 sec since 00:02:43 (-28 days), ignored, thread 51708261 InodeRevokeWorkerThread: for flush mapped pages, VMM iowait
gpfsnode2: Waiting 1148942.9294 sec since 05:14:07 (-13 days), ignored, thread 119472313 InodeRevokeWorkerThread: for flush mapped pages, VMM iowait
gpfsnode2: Waiting 2350538.0764 sec since 07:27:32 (-27 days), ignored, thread 80019525 InodeRevokeWorkerThread: for flush mapped pages, VMM iowait
Change nodes used for cluster communication¶
Can only be done if mmlscluster "Repository type: server-based". If it's CCR, then you need to disable quorum first
sudo mmshutdown -a
sudo mmchnode --nonquorum
sudo mmchnode --daemon-interface=gpfsnode1-newif --admin-interface=gpfsnode1-newif -N gpfsnode1
sudo mmchnode --daemon-interface=gpfsnode2-newif --admin-interface=gpfsnode2-newif -N gpfsnode2
sudo mmchconfig tiebreakerDisks="01_gpfs"
sudo mmstartup -a