Shared Storage Pool node not a member of any cluster.
a customer needed to restart two old nodes cluster from VIOS Shared Storage Pool, but the second node never join the cluster. and storage pool was only active on first node.
The vty console display error : 0967-043 This node is currently not a member of any cluster.
I check console log and one line has oriented my diagnostic.
# alog -ot console
0 Fri Nov 25 15:59:08 CET 2016 0967-112 Subsystem not configured.
0 Fri Nov 25 16:01:07 CET 2016 0967-043 This node is currently not a member of any cluster.
0 Fri Nov 25 16:03:10 CET 2016 0967-043 This node is currently not a member of any cluster.
0 Fri Nov 25 16:03:17 CET 2016 JVMJ9VM090I Slow response to network query (150 secs), check your IP DNS configuration
I check /etc/resolv.conf and found that the DNS entry was not reachable. I change DNS entry with first node DNS configuration, restart second node and it join the cluster normally.
emgr secret flag for EFIXLOCKED
A customer have problem to update one AIX with EFIXLOCKED in some fileset. In normal situation fileset is locked by a emergency fix, and before apply update you must remove e-fix. With this customer no e-fix ... nothing to remove, and /usr/emgrdata was empty.
emgr command is a korn shell script that contains secret flag "-G" for unlock fileset with unlock_installp_pkg() function.
emt || exit 1
unlock_installp_pkg "$OPTARG" "0"; exit $?;; # secret flag
########################################################################
## Function: unlock_installp_pkg
## Parameters: <PKG> <EFIX NUM>
########################################################################
if no e-fix was installed, then use anything for second argument $2 like "unlock"
/etc/objrepos:bos.rte.libc:7.1.4.0::COMMITTED:F:libc Library:EFIXLOCKED
/usr/lib/objrepos:bos.rte.libc:7.1.4.0::COMMITTED:F:libc Library:EFIXLOCKED
# emgr -G bos.rte.libc unlock
Initializing emgr manual maintenance tools.
Explicit Lock: unlocking installp fileset bos.rte.libc.
Also -D flag display Debug mode with set -x.
vio_daemon consuming high memory on VIOS
This saturday I update two dual VIOS from 2.2.2.1 to combinated 2.2.3.1 + 2.2.3.50 + 2.2.3.52 Fixpack.
one VIOS was using lot of memory (8GB of computational), svmon show that vio_daemon used 12 segments of application stack (it's a joke). In fact, the customer had modified /etc/security/limits with stack, data and rss unlimited for root. Solved by Setting default value and reboot VIOS. See IBM technote.
Question
Why is vio_daemon consuming high memory on PoweVM Virtual I/O Server (VIOS)?
Cause
There is a known issue in VIOS 2.2.3.0 thru 2.2.3.3 with vio_daemon memory leak that was fixed at 2.2.3.4 with IV64508.
Answer
To check your VIOS level, as padmin, run:
$ ioslevel
If your VIOS level is 2.2.3.4 or higher, the problem may be due to having values in /etc/security/limits set to "unlimited" (-1). Particularly, the "stack" size setting, which exposes a condition where the system can be allowed to pin as much stack as desired causing vio_daemon to consume a lot of memory.
# vi /etc/security/limits ->check the default stanza
default:
fsize = -1
core = -1
cpu = -1
data = -1
rss = -1
stack = -1
nofiles = -1
In some cases, the issue with vio_daemon consuming high memory is noticed after a VIOS update to 2.2.3.X. However, a VIOS update will NOT change these settings. It is strongly recommended not to modify these default values as doing so is known to cause unpredictable results. Below is an example of the default values:
fsize = 2097151
core = 2097151
cpu = -1
data = 262144
rss = 65536
stack = 65536
nofiles = 2000
To correct the problem change the settings back to "default" values. Then reboot the VIOS at your earliest convenience.
Note 1
If the stack size was added to the root and/or padmin stanzas with unlimited setting, it should be removed prior to rebooting the VIOS.
Note 2
If there clients are not redundant via a second VIOS, a maintenance window should be schedule to bring the clients down before rebooting the VIOS.
SOURCE: IBM technote
NMON Visualizer
Pour ceux qui utilise l'outil de collecte de performance nmon développé par Nigel Griffiths et disponible pour IBM AIX/VIOS et Linux (Power, x86, x86_64, Mainframe & now ARM (Raspberry Pi)), comme vous le savez il faut se servir du fichier excel nmon analyser pour visualiser les fichiers de collecte nmon.
En complément de cet outil je vous conseille de tester NMONVisualizer, un projet IBM démarré par Hunter Presnall qui est un excelent outil pour comparer et analyser les fichiers nmon issue des collectes de performances de plusieurs systèmes ou VM AIX / Linux.
NMONVisualizer http://nmonvisualizer.github.io/nmonvisualizer/index.html
NMON
https://www.ibm.com/developerworks/aix/library/au-analyze_aix/
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power+Systems/page/nmon
Nmon analyser
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power%20Systems/page/nmon_analyser
My thanks to you all for a job very well done.
Minimum NIM master levels for VIOS clients
The NIM master level for VIOS is also for me a good point of vue to know the AIX level vs VIOS ioslevel.
https://www-304.ibm.com/webapp/set2/sas/f/flrt/viostable.html
Minimum NIM master levels for VIOS clients
If using NIM to backup, install or update a VIOS partition, the NIM master must be greater than or equal to the levels shown below.
VIOS Release | VIOS Level | Minimum NIM master level | ||||
---|---|---|---|---|---|---|
VIOS 2.2.6 | VIOS 2.2.6.10 | AIX 6100-09-10 | 7100-05-01 | 7200-02-01 | ||
VIOS 2.2.6.0 | AIX 6100-09-10 | 7100-05-00 | 7100-02-00 | |||
VIOS 2.2.5 | VIOS 2.2.5.30 | AIX 6100-09-10 | 7100-05-01 | 7200-02-01 | ||
VIOS 2.2.5.20 | AIX 6100-09-09 | 7100-04-04 | 7200-01-02 | |||
VIOS 2.2.5.10 | AIX 6100-09-08 | 7100-04-03 | 7200-01-01 | |||
VIOS 2.2.5.0 | AIX 6100-09-08 | 7100-04-03 | ||||
VIOS 2.2.4 | VIOS 2.2.4.50 | AIX 6100-09-10 | 7100-05-01 | 7200-02-01 | ||
VIOS 2.2.4.40 | AIX 6100-09-09 | 7100-04-04 | 7200-01-02 | |||
VIOS 2.2.4.30 | AIX 6100-09-08 | 7100-04-03 | 7200-01-01 | |||
VIOS 2.2.4.23 | AIX 6100-09-07 | 7100-04-02 | 7200-00-02 | |||
VIOS 2.2.4.22 | AIX 6100-09-07 | 7100-04-02 | 7200-00-02 | |||
VIOS 2.2.4.21 | AIX 6100-09-07 | 7100-04-02 | 7200-00-02 | |||
VIOS 2.2.4.20 | AIX 6100-09-07 | 7100-04-02 | 7200-00-02 | |||
VIOS 2.2.4.10 | AIX 6100-09-06 | 7100-04-01 | 7200-00-01 | |||
VIOS 2.2.4.0 | AIX 6100-09-06 | 7100-04-01 | 7200-00-01 | |||
VIOS 2.2.3 | VIOS 2.2.3.90 | AIX 6100-09-09 | 7100-04-04 | 7200-01-02 | ||
VIOS 2.2.3.80 | AIX 6100-09-08 | 7100-04-03 | 7200-01-01 | |||
VIOS 2.2.3.70 | AIX 6100-09-07 | 7100-04-02 | 7200-00-02 | |||
VIOS 2.2.3.60 | AIX 6100-09-06 | 7100-03-05 | ||||
VIOS 2.2.3.50 | AIX 6100-09-05 | 7100-03-05 | ||||
VIOS 2.2.3.4 | AIX 6100-09-04 | 7100-03-04 | ||||
VIOS 2.2.3.3 | AIX 6100-09-03 | 7100-03-03 | ||||
VIOS 2.2.3.2 | AIX 6100-09-02 | 7100-03-02 | ||||
VIOS 2.2.3.1 | AIX 6100-09-01 | 7100-03-01 | ||||
VIOS 2.2.3.0 | AIX 6100-09 | 7100-03 | ||||
VIOS 2.2.2 | VIOS 2.2.2.70 | AIX 6100-08-07 | 7100-02-07 | |||
VIOS 2.2.2.6 | AIX 6100-08-06 | 7100-02-06 | ||||
VIOS 2.2.2.5 | AIX 6100-08-05 | 7100-02-05 | ||||
VIOS 2.2.2.4 | AIX 6100-08-04 | 7100-02-04 | ||||
VIOS 2.2.2.3 | AIX 6100-08-03 | 7100-02-03 | ||||
VIOS 2.2.2.2 | AIX 6100-08-02 | 7100-02-02 | ||||
VIOS 2.2.2.1 | AIX 6100-08-01 | 7100-02-01 | ||||
VIOS 2.2.2.0 | AIX 6100-08 | 7100-02 | ||||
VIOS 2.2.1 | VIOS 2.2.1.9 | AIX 6100-07-10 | 7100-01-10 | |||
VIOS 2.2.1.8 | AIX 6100-07-09 | 7100-01-09 | ||||
VIOS 2.2.1.7 | AIX 6100-07-08 | 7100-01-07 | ||||
VIOS 2.2.1.5 | AIX 6100-07-05 | 7100-01-05 | ||||
VIOS 2.2.1.4 | AIX 6100-07-04 | 7100-01-04 | ||||
VIOS 2.2.1.3 | AIX 6100-07-02 | 7100-01-02 | ||||
VIOS 2.2.1.1 | AIX 6100-07-01 | 7100-01-01 | ||||
VIOS 2.2.1.0 | AIX 6100-07 | 7100-01 | ||||
VIOS 2.2.0 | VIOS 2.2.0.13 | AIX 6100-06-05 | 7100-00-03 | |||
VIOS 2.2.0.12 | AIX 6100-06-05 | 7100-00-03 | ||||
VIOS 2.2.0.11 | AIX 6100-06-03 | 7100-00-02 | ||||
VIOS 2.2.0.10 | AIX 6100-06-01 | 7100-00-01 | ||||
VIOS 2.2.0.0 | AIX 6100-06 | 7100-00 | ||||
VIOS 2.1.3 | VIOS 2.1.3.10 | AIX 6100-05-02 | ||||
VIOS 2.1.3.0 | AIX 6100-05 | |||||
VIOS 2.1.2 | VIOS 2.1.2.13 | AIX 6100-04-03 | ||||
VIOS 2.1.2.12 | AIX 6100-04-02 | |||||
VIOS 2.1.2.11 | AIX 6100-04-02 | |||||
VIOS 2.1.2.10 | AIX 6100-04-01 | |||||
VIOS 2.1.2.0 | AIX 6100-04 | |||||
VIOS 2.1.1 | VIOS 2.1.1.10 | AIX 6100-03-01 | ||||
VIOS 2.1.1.0 | AIX 6100-03 | |||||
VIOS 2.1.0 | VIOS 2.1.0.10 | AIX 6100-02-02 | ||||
VIOS 2.1.0.1 | AIX 6100-02-01 | |||||
VIOS 2.1.0.0 | AIX 6100-02 | |||||
VIOS 1.5.2 | VIOS 1.5.2.6 | AIX 5300-08-08 | ||||
VIOS 1.5.2.5 | AIX 5300-08-05 | |||||
VIOS 1.5.2.1 | AIX 5300-08-01 | |||||
VIOS 1.5.2.0 | AIX 5300-08 |
VIOS Adapter_reset on SEA LOAD SHARING
Technote (FAQ)
Question
How can I prevent network outage on SEA in loadsharing mode over physical adapter/LACP(8023ad link aggregation).
Answer
SEA load sharing is initiated by Backup SEA. In the VIOS levels (older than 2.2.4.0), SEA going to Backup state calls for adapter reset by default.
Some physical adapters may takes 30 sec or longer to complete adapter reset and LACP negotiation may take 30 sec for LACP negotiation. If SEA is configured with those physical adapters or LACP, network communication for the SEA in backup_sh state may be affected temporarily during a system reboot or cable pull/plug back in.
Changing value of "adapter_reset" attribute to "no" on a pair of SEA in loadsharing mode.
1. Login to padmin
2. Change to root prompt:
$ oem_setup_env
3. List the adapters:
# lsdev -Cc adapter
4. Find the Shared Ethernet Adapters
ent7 Available Shared Ethernet Adapter
5. Use the entstat command to list the components of the SEA:
# entstat -d ent7 | grep State
On SEA in primary loadsharing mode
State : PRIMARY_SH
On SEA in backup loadsharing mode
State : BACKUP_SH
6. Use the lsattr command to list attributes of the SEA
# lsattr -El ent7
adapter_reset yes
7. Change adapter_reset to "no". This change is dynamic and doesn't require reboot.
chdev -dev ent7 -attr adapter_reset=no
8. Use the lsattr command to confirm the change
# lsattr -El ent7
adapter_reset no
IBM AIX – From Strength to Strength – 2014
Un document intéressant qui résume les fonctionnalités et supports par version d'AIX, Vitual I/O Server et autres produits pour IBM POWER.
Lien permanent :
http://public.dhe.ibm.com/common/ssi/ecm/en/poo03022usen/POO03022USEN.PDF
Thank you Jay.
viosbr tool (cluster & Shared Storage Pool)
viosbr est un outil de sauvegarde de la configuration du VIOS voir : viosbr tool
Autant réparer un VIOS "classic" est possible ... par contre récupérer un cluster VIOS + SSP , le disque de repository, la base de donnée SolidDB et le contenu des Shared Storage Pool apparait être sans issue en l’absence de cette sauvegarde.
cette commande est donc très utile pour récupérer un VIOS en cluster + Shared Storage Pool. J'ai testé une restauration de cette sauvegarde "viosbr" sur une "fresh install" de VIOS à partir des DVDs et ai pu récupérer le cluster et le SSP.
VIOS cluster + Shared Storage Pool
Afficher le nom du cluster
Storage Interface Query
Cluster Name: CL007
Cluster uuid: 27e1e2f6-326a-11e3-b1d2-c8502e2e8474
Number of nodes reporting = 1
Number of nodes expected = 1
Node vios1
Node uuid = 27a92894-326a-11e3-b1d2-c8502e2e8474
Number of disk discovered = 2
hdiskpower0
state : UP
uDid : 352136006016080230F00308B27262F30E31106RAID 503DGCfcp
uUid : afee840c-03d4-a33c-3a31-1c3eb3154dd1
type : CLUSDISK
hdiskpower1
state : UP
uDid :
uUid : d432ea62-ac06-6bd3-f49a-22b3e525d452
type : REPDISK
Sauvegarde de la configuration du cluster VIOS
Backup of this node (vios1) successful
Location des sauvegardes
vios1_SSP.CL007.tar.gz
$ ls -l /home/padmin/cfgbackups
total 704
-rw-r--r-- 1 root staff 340208 Oct 17 10:36 vios1_SSP.CL007.tar.gz
Exemple de restauration from scratch
J'ai voulu tester ce que l'on pouvait récupérer en partant de zéro.
Extraire la version du VIOS à partir de la sauvegarde "viosbr" : décompresser l'archive vios1_SSP.CL007.tar.gz et éditer le fichier XML
<general>
<xml-version>2.0</xml-version>
<xml-ch-date>0</xml-ch-date>
<backUpDate>2014-04-04</backUpDate>
<backUpTime>18:05:40</backUpTime>
<backUpPrPID>7471262</backUpPrPID>
<aix-level>6.1.0.0</aix-level>
<vios-level>2.2.1.0</vios-level> ---<°)))))><)
Réinstallation du VIOS a partir des DVDs d'installation en version 2.2.1.0, installation des drivers EMC + PowerPath, détection des LUNs (Data + caa repository), reconfiguration SEA et TCP/IP.
En cluster VIOS , avant de restaurer la sauvegarde il faut supprimer la signature CAA sur le disque de repository.
NAME PVID VG STATUS
hdisk0 00c8502e6ae82716 rootvg active
hdisk1 none None
hdisk2 none None
hdisk3 none None
hdisk4 none None
hdiskpower0 00c8502e5f866570 None
hdiskpower1 00c8502e5f7cbfc5 caavg_private
$ cleandisk -r hdiskpower1
0967-112 Subsystem not configured.
This operation will scrub hdiskpower1, removing any volume groups and clearing cluster identifiers.
If another cluster is using this disk, that cluster will be destroyed.
Are you sure? (y/[n]) y
cluster_utils.c get_cluster_lock 6089 Force continue.
rmcluster: succeeded
Déposer la sauvegarde dans le home directory de padmin et restauré la en spécifiant le disque de repository CAA.
"CLUSTER restore successful.
Le disque de respository est recréé, la base de donnée SolidDB est restaurée, le file system contenant les VM est accessible.
Certains mapping disques (vhosts) ont disparus. Ils conviendra de les reconfigurer manuellement.
Exemple :
SVSA Physloc Client Partition ID
--------------- -------------------------------------------- ------------------
vhost3 U9117.570.658502E-V2-C40 0x00000000
VTD NO VIRTUAL TARGET DEVICE FOUND
Lister les disques virtuels
Pool Size(mb) Free(mb) TotalLUSize(mb) LUs Type PoolID
SPA 651776 546513 319488 27 CLPOOL 000000000A1AB479000000005257E461
$ lssp -clustername CL007 -sp SPA -bd
Lu Name Size(mb) ProvisionType Lu Udid
....
lab71_data1 10240 THIN 4766035ab69e3e8c9d7a4b9c5446a3b6
lab71_rootvg 10240 THIN f727eaee25e9afe9ef5776205b2b56d8
....
Extraire le mapping de la sauvegarde
# /usr/ios/cli/ioscli viosbr -view -file /home/padmin/vios1_SSP.CL007.tar.gz -clustername CL007 -mapping > /tmp/mapping.txt
Identifier les disques logique mappé sur le vhosts
SVSA Physloc Client Partition ID
------------------- ---------------------------------- --------------------
vhost3 U9117.570.658502E-V2-C40 0x00000008
VTD lab71_data1
Status Available
LUN 0x8300000000000000
Backing Device lab71_data1.4766035ab69e3e8c9d7a4b9c5446a3b6
Physloc
Mirrored N/A
SVSA Physloc Client Partition ID
------------------- ---------------------------------- --------------------
vhost3 U9117.570.658502E-V2-C40 0x00000008
VTD lab71_rootvg
Status Available
LUN 0x8100000000000000
Backing Device lab71_rootvg.f727eaee25e9afe9ef5776205b2b56d8
Physloc
Mirrored N/A
Mapping des disques sur le vhosts
Assigning file "lab71_rootvg" as a backing device.
VTD:lab71_rootvg
$ mkbdsp -clustername CL007 -sp SPA -bd lab71_data1 -vadapter vhost3 -tn lab71_data1
Assigning file "lab71_data1" as a backing device.
VTD:lab71_data1
$ lsmap -vadapter vhost3
SVSA Physloc Client Partition ID
--------------- -------------------------------------------- ------------------
vhost3 U9117.570.658502E-V2-C40 0x00000000
VTD lab71_data1
Status Available
LUN 0x8300000000000000
Backing device lab71_data1.4766035ab69e3e8c9d7a4b9c5446a3b6
Physloc
Mirrored N/A
VTD lab71_rootvg
Status Available
LUN 0x8100000000000000
Backing device lab71_rootvg.f727eaee25e9afe9ef5776205b2b56d8
Physloc
Mirrored N/A
AIX MPIO error log information
SC_DISK_PCM_ERR1 Subsystem Component Failure
The storage subsystem has returned an error indicating that some component (hardware or software) of the storage subsystem has failed. The detailed sense data identifies the failing component and the recovery action that is required. Failing hardware components should also be shown in the Storage Manager software, so the placement of these errors in the error log is advisory and is an aid for your technical-support representative.
SC_DISK_PCM_ERR2 Array Active Controller Switch
The active controller for one or more hdisks associated with the storage subsystem has changed. This is in response to some direct action by the AIX host (failover or autorecovery). This message is associated with either a set of failure conditions causing a failover or, after a successful failover, with the recovery of paths to the preferred controller on hdisks with the autorecovery attribute set to yes.
SC_DISK_PCM_ERR3 Array Controller Switch Failure
An attempt to switch active controllers has failed. This leaves one or more paths with no working path to a controller. The AIX MPIO PCM will retry this error several times in an attempt to find a successful path to a controller.
SC_DISK_PCM_ERR4 Array Configuration Changed
The active controller for an hdisk has changed, usually due to an action not initiated by this host. This might be another host initiating failover or recovery, for shared LUNs, a redistribute operation from the Storage Manager software, a change to the preferred path in the Storage Manager software, a controller being taken offline, or any other action that causes the active controller ownership to change.
SC_DISK_PCM_ERR5 Array Cache Battery Drained
The storage subsystem cache battery has drained. Any data remaining in the cache is dumped and is vulnerable to data loss until it is dumped. Caching is not normally allowed with drained batteries unless the administrator takes action to enable it within the Storage Manager software.
SC_DISK_PCM_ERR6 Array Cache Battery Charge Is Low
The storage subsystem cache batteries are low and need to be charged or replaced.
SC_DISK_PCM_ERR7 Cache Mirroring Disabled
Cache mirroring is disabled on the affected hdisks. Normally, any cached write data is kept within the cache of both controllers so that if either controller fails there is still a good copy of the data. This is a warning message stating that loss of a single controller will result in data loss.
SC_DISK_PCM_ERR8 Path Has Failed
The I/O path to a controller has failed or gone offline.
SC_DISK_PCM_ERR9 Path Has Recovered
The I/O path to a controller has resumed and is back online.
SC_DISK_PCM_ERR10 Array Drive Failure
A physical drive in the storage array has failed and should be replaced.
SC_DISK_PCM_ERR11 Reservation Conflict
A PCM operation has failed due to a reservation conflict. This error is not currently issued.
SC_DISK_PCM_ERR12 Snapshot™ Volume’s Repository Is Full
The snapshot volume repository is full. Write actions to the snapshot volume will fail until the repository problems are fixed.
SC_DISK_PCM_ERR13 Snapshot Op Stopped By Administrator
The administrator has halted a snapshot operation.
SC_DISK_PCM_ERR14 Snapshot repository metadata error
The storage subsystem has reported that there is a problem with snapshot metadata.
SC_DISK_PCM_ERR15 Illegal I/O - Remote Volume Mirroring
The I/O is directed to an illegal target that is part of a remote volume mirroring pair (the target volume rather than the source volume).
SC_DISK_PCM_ERR16 Snapshot Operation Not Allowed
A snapshot operation that is not allowed has been attempted.
SC_DISK_PCM_ERR17 Snapshot Volume’s Repository Is Full
The snapshot volume repository is full. Write actions to the snapshot volume will fail until the repository problems are fixed.
SC_DISK_PCM_ERR18 Write Protected
The hdisk is write-protected. This can happen if a snapshot volume repository is full.
SC_DISK_PCM_ERR19 Single Controller Restarted
The I/O to a single-controller storage subsystem is resumed.
SC_DISK_PCM_ERR20 Single Controller Restart Failure
The I/O to a single-controller storage subsystem is not resumed. The AIX MPIO PCM will continue to attempt to restart the I/O to the storage subsystem.
EMC VNX Snapview not supported with AIX MPIO
I found that some customers uses SnapView on CX or VNX Flare with AIX native MPIO driver on VIOS or AIX.
Already in 2008, EMC wrote a technical note specifying that software layered like Snapview was not supported with AIX MPIO.
Technote: 300-008-486_aix_native_mpio_clariion_1108
Today, this technote has disappeared but EMC support write a EMC primus "emc75601" specify that it is still not supported with VNX software layered.
Driver Example :
EMC.CLARiiON.fcp.MPIO.rte 5.3.0.8 C F EMC CLARiiON FCP MPIO Support
devices.common.IBM.mpio.rte 6.1.7.15 C F MPIO Disk Path Control Module
EMC primus case "emc75601"
EMC Layered software such as SnapView, MirrorView/Asynchronous, MirrorView/Synchronous,
EMC SAN Copy, etc., are not supported with hosts running AIX Native MPIO
So if it is imperative to use Snapview, then install EMC PowerPath.