unixadmin.free.fr Handy Unix Plumbing Tips and Tricks

25nov/16Off

Shared Storage Pool node not a member of any cluster.

a customer needed to restart two old nodes cluster from VIOS Shared Storage Pool, but the second node never join the cluster. and storage pool was only active on first node.

The vty console display error : 0967-043 This node is currently not a member of any cluster.

I check console log and one line has oriented my diagnostic.

$ r oem
# alog -ot console
0 Fri Nov 25 15:59:08 CET 2016 0967-112 Subsystem not configured.
0 Fri Nov 25 16:01:07 CET 2016 0967-043 This node is currently not a member of any cluster.
0 Fri Nov 25 16:03:10 CET 2016 0967-043 This node is currently not a member of any cluster.
0 Fri Nov 25 16:03:17 CET 2016 JVMJ9VM090I Slow response to network query (150 secs), check your IP DNS configuration

I check /etc/resolv.conf and found that the DNS entry was not reachable. I change DNS entry with first node DNS configuration, restart second node and it join the cluster normally.

Taggé comme: Commentaires
12oct/16Off

emgr secret flag for EFIXLOCKED

A customer have problem to update one AIX with EFIXLOCKED in some fileset. In normal situation fileset is locked by a emergency fix, and before apply update you must remove e-fix. With this customer no e-fix ... nothing to remove, and /usr/emgrdata was empty.

emgr command is a korn shell script that contains secret flag "-G" for unlock fileset with unlock_installp_pkg() function.

 G) # Secret flag. DO NOT USE THIS UNLESS YOU ARE SURE !
            emt || exit 1
            unlock_installp_pkg "$OPTARG" "0"; exit $?;; # secret flag

########################################################################
## Function: unlock_installp_pkg
## Parameters: <PKG> <EFIX NUM>
########################################################################

if no e-fix was installed, then use anything for second argument $2 like "unlock"

#  lslpp -qlc | grep EFIXLOCKED
/etc/objrepos:bos.rte.libc:7.1.4.0::COMMITTED:F:libc Library:EFIXLOCKED
/usr/lib/objrepos:bos.rte.libc:7.1.4.0::COMMITTED:F:libc Library:EFIXLOCKED

# emgr -G bos.rte.libc unlock    
Initializing emgr manual maintenance tools.
Explicit Lock: unlocking installp fileset bos.rte.libc.

Also -D flag display Debug mode with set -x.

28nov/15Off

vio_daemon consuming high memory on VIOS

This saturday I update two dual VIOS from 2.2.2.1 to combinated 2.2.3.1 + 2.2.3.50 + 2.2.3.52 Fixpack.
one VIOS was using lot of memory (8GB of computational), svmon show that vio_daemon used 12 segments of application stack (it's a joke). In fact, the customer had modified /etc/security/limits with stack, data and rss unlimited for root. Solved by Setting default value and reboot VIOS. See IBM technote.

Question
Why is vio_daemon consuming high memory on PoweVM Virtual I/O Server (VIOS)?

Cause
There is a known issue in VIOS 2.2.3.0 thru 2.2.3.3 with vio_daemon memory leak that was fixed at 2.2.3.4 with IV64508.

Answer
To check your VIOS level, as padmin, run:
$ ioslevel

If your VIOS level is 2.2.3.4 or higher, the problem may be due to having values in /etc/security/limits set to "unlimited" (-1). Particularly, the "stack" size setting, which exposes a condition where the system can be allowed to pin as much stack as desired causing vio_daemon to consume a lot of memory.

$ oem_setup_env

# vi /etc/security/limits ->check the default stanza

default:
        fsize = -1
        core = -1
        cpu = -1
        data = -1
        rss = -1
        stack = -1
        nofiles = -1

In some cases, the issue with vio_daemon consuming high memory is noticed after a VIOS update to 2.2.3.X. However, a VIOS update will NOT change these settings. It is strongly recommended not to modify these default values as doing so is known to cause unpredictable results. Below is an example of the default values:

default:
        fsize = 2097151
        core = 2097151
        cpu = -1
        data = 262144
        rss = 65536
        stack = 65536
        nofiles = 2000

To correct the problem change the settings back to "default" values. Then reboot the VIOS at your earliest convenience.

Note 1
If the stack size was added to the root and/or padmin stanzas with unlimited setting, it should be removed prior to rebooting the VIOS.

Note 2
If there clients are not redundant via a second VIOS, a maintenance window should be schedule to bring the clients down before rebooting the VIOS.

SOURCE: IBM technote

Taggé comme: Commentaires
27oct/15

NMON Visualizer

Pour ceux qui utilise l'outil de collecte de performance nmon développé par Nigel Griffiths et disponible pour IBM AIX/VIOS et Linux (Power, x86, x86_64, Mainframe & now ARM (Raspberry Pi)), comme vous le savez il faut se servir du fichier excel nmon analyser pour visualiser les fichiers de collecte nmon.

En complément de cet outil je vous conseille de tester NMONVisualizer, un projet IBM démarré par Hunter Presnall qui est un excelent outil pour comparer et analyser les fichiers nmon issue des collectes de performances de plusieurs systèmes ou VM AIX / Linux.

NMONVisualizer http://nmonvisualizer.github.io/nmonvisualizer/index.html

NMON
https://www.ibm.com/developerworks/aix/library/au-analyze_aix/

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power+Systems/page/nmon

http://nmon.sourceforge.net

Nmon analyser
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power%20Systems/page/nmon_analyser

My thanks to you all for a job very well done. :)

25août/15

Minimum NIM master levels for VIOS clients

The NIM master level for VIOS is also for me a good point of vue to know the AIX level vs VIOS ioslevel.

https://www-304.ibm.com/webapp/set2/sas/f/flrt/viostable.html

Minimum NIM master levels for VIOS clients

If using NIM to backup, install or update a VIOS partition, the NIM master must be greater than or equal to the levels shown below.

VIOS Release VIOS Level Minimum NIM master level
VIOS 2.2.6 VIOS 2.2.6.10 AIX 6100-09-10 7100-05-01 7200-02-01
VIOS 2.2.6.0 AIX 6100-09-10 7100-05-00 7100-02-00
VIOS 2.2.5 VIOS 2.2.5.30 AIX 6100-09-10 7100-05-01 7200-02-01
VIOS 2.2.5.20 AIX 6100-09-09 7100-04-04 7200-01-02
VIOS 2.2.5.10 AIX 6100-09-08 7100-04-03 7200-01-01
VIOS 2.2.5.0 AIX 6100-09-08 7100-04-03
VIOS 2.2.4 VIOS 2.2.4.50 AIX 6100-09-10 7100-05-01 7200-02-01
VIOS 2.2.4.40 AIX 6100-09-09 7100-04-04 7200-01-02
VIOS 2.2.4.30 AIX 6100-09-08 7100-04-03 7200-01-01
VIOS 2.2.4.23 AIX 6100-09-07 7100-04-02 7200-00-02
VIOS 2.2.4.22 AIX 6100-09-07 7100-04-02 7200-00-02
VIOS 2.2.4.21 AIX 6100-09-07 7100-04-02 7200-00-02
VIOS 2.2.4.20 AIX 6100-09-07 7100-04-02 7200-00-02
VIOS 2.2.4.10 AIX 6100-09-06 7100-04-01 7200-00-01
VIOS 2.2.4.0 AIX 6100-09-06 7100-04-01 7200-00-01
VIOS 2.2.3 VIOS 2.2.3.90 AIX 6100-09-09 7100-04-04 7200-01-02
VIOS 2.2.3.80 AIX 6100-09-08 7100-04-03 7200-01-01
VIOS 2.2.3.70 AIX 6100-09-07 7100-04-02 7200-00-02
VIOS 2.2.3.60 AIX 6100-09-06 7100-03-05
VIOS 2.2.3.50 AIX 6100-09-05 7100-03-05
VIOS 2.2.3.4 AIX 6100-09-04 7100-03-04
VIOS 2.2.3.3 AIX 6100-09-03 7100-03-03
VIOS 2.2.3.2 AIX 6100-09-02 7100-03-02
VIOS 2.2.3.1 AIX 6100-09-01 7100-03-01
VIOS 2.2.3.0 AIX 6100-09 7100-03
VIOS 2.2.2 VIOS 2.2.2.70 AIX 6100-08-07 7100-02-07
VIOS 2.2.2.6 AIX 6100-08-06 7100-02-06
VIOS 2.2.2.5 AIX 6100-08-05 7100-02-05
VIOS 2.2.2.4 AIX 6100-08-04 7100-02-04
VIOS 2.2.2.3 AIX 6100-08-03 7100-02-03
VIOS 2.2.2.2 AIX 6100-08-02 7100-02-02
VIOS 2.2.2.1 AIX 6100-08-01 7100-02-01
VIOS 2.2.2.0 AIX 6100-08 7100-02
VIOS 2.2.1 VIOS 2.2.1.9 AIX 6100-07-10 7100-01-10
VIOS 2.2.1.8 AIX 6100-07-09 7100-01-09
VIOS 2.2.1.7 AIX 6100-07-08 7100-01-07
VIOS 2.2.1.5 AIX 6100-07-05 7100-01-05
VIOS 2.2.1.4 AIX 6100-07-04 7100-01-04
VIOS 2.2.1.3 AIX 6100-07-02 7100-01-02
VIOS 2.2.1.1 AIX 6100-07-01 7100-01-01
VIOS 2.2.1.0 AIX 6100-07 7100-01
VIOS 2.2.0 VIOS 2.2.0.13 AIX 6100-06-05 7100-00-03
VIOS 2.2.0.12 AIX 6100-06-05 7100-00-03
VIOS 2.2.0.11 AIX 6100-06-03 7100-00-02
VIOS 2.2.0.10 AIX 6100-06-01 7100-00-01
VIOS 2.2.0.0 AIX 6100-06 7100-00
VIOS 2.1.3 VIOS 2.1.3.10 AIX 6100-05-02
VIOS 2.1.3.0 AIX 6100-05
VIOS 2.1.2 VIOS 2.1.2.13 AIX 6100-04-03
VIOS 2.1.2.12 AIX 6100-04-02
VIOS 2.1.2.11 AIX 6100-04-02
VIOS 2.1.2.10 AIX 6100-04-01
VIOS 2.1.2.0 AIX 6100-04
VIOS 2.1.1 VIOS 2.1.1.10 AIX 6100-03-01
VIOS 2.1.1.0 AIX 6100-03
VIOS 2.1.0 VIOS 2.1.0.10 AIX 6100-02-02
VIOS 2.1.0.1 AIX 6100-02-01
VIOS 2.1.0.0 AIX 6100-02
VIOS 1.5.2 VIOS 1.5.2.6 AIX 5300-08-08
VIOS 1.5.2.5 AIX 5300-08-05
VIOS 1.5.2.1 AIX 5300-08-01
VIOS 1.5.2.0 AIX 5300-08
30juin/15

VIOS Adapter_reset on SEA LOAD SHARING

Technote (FAQ)

Question
How can I prevent network outage on SEA in loadsharing mode over physical adapter/LACP(8023ad link aggregation).

Answer
SEA load sharing is initiated by Backup SEA. In the VIOS levels (older than 2.2.4.0), SEA going to Backup state calls for adapter reset by default.
Some physical adapters may takes 30 sec or longer to complete adapter reset and LACP negotiation may take 30 sec for LACP negotiation. If SEA is configured with those physical adapters or LACP, network communication for the SEA in backup_sh state may be affected temporarily during a system reboot or cable pull/plug back in.

Changing value of "adapter_reset" attribute to "no" on a pair of SEA in loadsharing mode.

1. Login to padmin

2. Change to root prompt:
$ oem_setup_env

3. List the adapters:
# lsdev -Cc adapter

4. Find the Shared Ethernet Adapters
ent7 Available Shared Ethernet Adapter

5. Use the entstat command to list the components of the SEA:
# entstat -d ent7 | grep State
On SEA in primary loadsharing mode
State : PRIMARY_SH
On SEA in backup loadsharing mode
State : BACKUP_SH

6. Use the lsattr command to list attributes of the SEA
# lsattr -El ent7
adapter_reset yes

7. Change adapter_reset to "no". This change is dynamic and doesn't require reboot.
chdev -dev ent7 -attr adapter_reset=no

8. Use the lsattr command to confirm the change
# lsattr -El ent7
adapter_reset no

13nov/14

IBM AIX – From Strength to Strength – 2014

Un document intéressant qui résume les fonctionnalités et supports par version d'AIX, Vitual I/O Server et autres produits pour IBM POWER.

Lien permanent :
http://public.dhe.ibm.com/common/ssi/ecm/en/poo03022usen/POO03022USEN.PDF

POO03022USEN

Thank you Jay.

17oct/14

viosbr tool (cluster & Shared Storage Pool)

viosbr est un outil de sauvegarde de la configuration du VIOS voir : viosbr tool

Autant réparer un VIOS "classic" est possible ... par contre récupérer un cluster VIOS + SSP , le disque de repository, la base de donnée SolidDB et le contenu des Shared Storage Pool apparait être sans issue en l’absence de cette sauvegarde.

cette commande est donc très utile pour récupérer un VIOS en cluster + Shared Storage Pool. J'ai testé une restauration de cette sauvegarde "viosbr" sur une "fresh install" de VIOS à partir des DVDs et ai pu récupérer le cluster et le SSP.

VIOS cluster + Shared Storage Pool

Afficher le nom du cluster

$ lscluster -d
Storage Interface Query

Cluster Name:  CL007
Cluster uuid:  27e1e2f6-326a-11e3-b1d2-c8502e2e8474
Number of nodes reporting = 1
Number of nodes expected = 1
Node vios1
Node uuid = 27a92894-326a-11e3-b1d2-c8502e2e8474
Number of disk discovered = 2
        hdiskpower0
          state : UP
          uDid  : 352136006016080230F00308B27262F30E31106RAID 503DGCfcp
          uUid  : afee840c-03d4-a33c-3a31-1c3eb3154dd1
          type  : CLUSDISK
        hdiskpower1
          state : UP
          uDid  :
          uUid  : d432ea62-ac06-6bd3-f49a-22b3e525d452
          type  : REPDISK

Sauvegarde de la configuration du cluster VIOS

$ viosbr -backup -clustername CL007 -file vios1_SSP
Backup of this node (vios1) successful

Location des sauvegardes

$ viosbr -view -list
vios1_SSP.CL007.tar.gz

$ ls -l /home/padmin/cfgbackups
total 704
-rw-r--r--    1 root     staff        340208 Oct 17 10:36 vios1_SSP.CL007.tar.gz

Exemple de restauration from scratch

J'ai voulu tester ce que l'on pouvait récupérer en partant de zéro.

Extraire la version du VIOS à partir de la sauvegarde "viosbr" : décompresser l'archive vios1_SSP.CL007.tar.gz et éditer le fichier XML

<vios-backup>
    <general>
        <xml-version>2.0</xml-version>
        <xml-ch-date>0</xml-ch-date>
        <backUpDate>2014-04-04</backUpDate>
        <backUpTime>18:05:40</backUpTime>
        <backUpPrPID>7471262</backUpPrPID>
        <aix-level>6.1.0.0</aix-level>
        <vios-level>2.2.1.0</vios-level>      ---<°)))))><)

Réinstallation du VIOS a partir des DVDs d'installation en version 2.2.1.0, installation des drivers EMC + PowerPath, détection des LUNs (Data + caa repository), reconfiguration SEA et TCP/IP.

En cluster VIOS , avant de restaurer la sauvegarde il faut supprimer la signature CAA sur le disque de repository.

$ lspv
NAME             PVID                                 VG               STATUS
hdisk0           00c8502e6ae82716                     rootvg           active
hdisk1           none                                 None
hdisk2           none                                 None
hdisk3           none                                 None
hdisk4           none                                 None
hdiskpower0      00c8502e5f866570                     None
hdiskpower1      00c8502e5f7cbfc5                     caavg_private    

$ cleandisk -r hdiskpower1
0967-112 Subsystem not configured.
This operation will scrub hdiskpower1, removing any volume groups and clearing cluster identifiers.
If another cluster is using this disk, that cluster will be destroyed.
Are you sure?  (y/[n]) y
cluster_utils.c get_cluster_lock        6089    Force continue.
rmcluster: succeeded

Déposer la sauvegarde dans le home directory de padmin et restauré la en spécifiant le disque de repository CAA.

$ viosbr -restore -clustername CL007 -file /home/padmin/vios1_SSP.CL007.tar.gz -repopvs hdiskpower1
"CLUSTER restore successful.

Le disque de respository est recréé, la base de donnée SolidDB est restaurée, le file system contenant les VM est accessible.

Certains mapping disques (vhosts) ont disparus. Ils conviendra de les reconfigurer manuellement.

Exemple :

$ lsmap -vadapter vhost3
SVSA            Physloc                                      Client Partition ID
--------------- -------------------------------------------- ------------------
vhost3          U9117.570.658502E-V2-C40                     0x00000000

VTD                   NO VIRTUAL TARGET DEVICE FOUND

Lister les disques virtuels

$ lssp -clustername CL007
Pool             Size(mb)    Free(mb)    TotalLUSize(mb)    LUs     Type        PoolID
SPA              651776      546513      319488             27      CLPOOL      000000000A1AB479000000005257E461

$ lssp -clustername CL007 -sp SPA -bd
Lu Name          Size(mb)    ProvisionType    Lu Udid
....
lab71_data1        10240       THIN             4766035ab69e3e8c9d7a4b9c5446a3b6
lab71_rootvg       10240       THIN             f727eaee25e9afe9ef5776205b2b56d8
....

Extraire le mapping de la sauvegarde

$ oem_setup_env
# /usr/ios/cli/ioscli viosbr -view -file /home/padmin/vios1_SSP.CL007.tar.gz -clustername CL007  -mapping > /tmp/mapping.txt

Identifier les disques logique mappé sur le vhosts

# more /tmp/mapping.txt

SVSA                Physloc                            Client Partition ID
------------------- ---------------------------------- --------------------
vhost3              U9117.570.658502E-V2-C40           0x00000008

VTD                      lab71_data1
Status                   Available
LUN                      0x8300000000000000
Backing Device           lab71_data1.4766035ab69e3e8c9d7a4b9c5446a3b6
Physloc
Mirrored                 N/A

SVSA                Physloc                            Client Partition ID
------------------- ---------------------------------- --------------------
vhost3              U9117.570.658502E-V2-C40           0x00000008

VTD                      lab71_rootvg
Status                   Available
LUN                      0x8100000000000000
Backing Device           lab71_rootvg.f727eaee25e9afe9ef5776205b2b56d8
Physloc
Mirrored                 N/A

Mapping des disques sur le vhosts

$ mkbdsp -clustername CL007 -sp SPA -bd lab71_rootvg -vadapter vhost3 -tn lab71_rootvg
Assigning file "lab71_rootvg" as a backing device.
VTD:lab71_rootvg

$ mkbdsp -clustername CL007 -sp SPA -bd lab71_data1 -vadapter vhost3 -tn lab71_data1
Assigning file "lab71_data1" as a backing device.
VTD:lab71_data1

$ lsmap -vadapter vhost3
SVSA            Physloc                                      Client Partition ID
--------------- -------------------------------------------- ------------------
vhost3          U9117.570.658502E-V2-C40                     0x00000000

VTD                   lab71_data1
Status                Available
LUN                   0x8300000000000000
Backing device        lab71_data1.4766035ab69e3e8c9d7a4b9c5446a3b6
Physloc
Mirrored              N/A

VTD                   lab71_rootvg
Status                Available
LUN                   0x8100000000000000
Backing device        lab71_rootvg.f727eaee25e9afe9ef5776205b2b56d8
Physloc
Mirrored              N/A
9sept/14

AIX MPIO error log information

SC_DISK_PCM_ERR1 Subsystem Component Failure

The storage subsystem has returned an error indicating that some component (hardware or software) of the storage subsystem has failed. The detailed sense data identifies the failing component and the recovery action that is required. Failing hardware components should also be shown in the Storage Manager software, so the placement of these errors in the error log is advisory and is an aid for your technical-support representative.

SC_DISK_PCM_ERR2 Array Active Controller Switch

The active controller for one or more hdisks associated with the storage subsystem has changed. This is in response to some direct action by the AIX host (failover or autorecovery). This message is associated with either a set of failure conditions causing a failover or, after a successful failover, with the recovery of paths to the preferred controller on hdisks with the autorecovery attribute set to yes.

SC_DISK_PCM_ERR3 Array Controller Switch Failure

An attempt to switch active controllers has failed. This leaves one or more paths with no working path to a controller. The AIX MPIO PCM will retry this error several times in an attempt to find a successful path to a controller.

SC_DISK_PCM_ERR4 Array Configuration Changed

The active controller for an hdisk has changed, usually due to an action not initiated by this host. This might be another host initiating failover or recovery, for shared LUNs, a redistribute operation from the Storage Manager software, a change to the preferred path in the Storage Manager software, a controller being taken offline, or any other action that causes the active controller ownership to change.

SC_DISK_PCM_ERR5 Array Cache Battery Drained

The storage subsystem cache battery has drained. Any data remaining in the cache is dumped and is vulnerable to data loss until it is dumped. Caching is not normally allowed with drained batteries unless the administrator takes action to enable it within the Storage Manager software.

SC_DISK_PCM_ERR6 Array Cache Battery Charge Is Low

The storage subsystem cache batteries are low and need to be charged or replaced.

SC_DISK_PCM_ERR7 Cache Mirroring Disabled

Cache mirroring is disabled on the affected hdisks. Normally, any cached write data is kept within the cache of both controllers so that if either controller fails there is still a good copy of the data. This is a warning message stating that loss of a single controller will result in data loss.

SC_DISK_PCM_ERR8 Path Has Failed

The I/O path to a controller has failed or gone offline.

SC_DISK_PCM_ERR9 Path Has Recovered

The I/O path to a controller has resumed and is back online.

SC_DISK_PCM_ERR10 Array Drive Failure

A physical drive in the storage array has failed and should be replaced.

SC_DISK_PCM_ERR11 Reservation Conflict

A PCM operation has failed due to a reservation conflict. This error is not currently issued.

SC_DISK_PCM_ERR12 Snapshot™ Volume’s Repository Is Full

The snapshot volume repository is full. Write actions to the snapshot volume will fail until the repository problems are fixed.

SC_DISK_PCM_ERR13 Snapshot Op Stopped By Administrator

The administrator has halted a snapshot operation.

SC_DISK_PCM_ERR14 Snapshot repository metadata error

The storage subsystem has reported that there is a problem with snapshot metadata.

SC_DISK_PCM_ERR15 Illegal I/O - Remote Volume Mirroring

The I/O is directed to an illegal target that is part of a remote volume mirroring pair (the target volume rather than the source volume).

SC_DISK_PCM_ERR16 Snapshot Operation Not Allowed

A snapshot operation that is not allowed has been attempted.

SC_DISK_PCM_ERR17 Snapshot Volume’s Repository Is Full

The snapshot volume repository is full. Write actions to the snapshot volume will fail until the repository problems are fixed.

SC_DISK_PCM_ERR18 Write Protected

The hdisk is write-protected. This can happen if a snapshot volume repository is full.

SC_DISK_PCM_ERR19 Single Controller Restarted

The I/O to a single-controller storage subsystem is resumed.

SC_DISK_PCM_ERR20 Single Controller Restart Failure

The I/O to a single-controller storage subsystem is not resumed. The AIX MPIO PCM will continue to attempt to restart the I/O to the storage subsystem.

Taggé comme: Aucun commentaire
28juil/13

EMC VNX Snapview not supported with AIX MPIO

I found that some customers uses SnapView on CX or VNX Flare with AIX native MPIO driver on VIOS or AIX.

Already in 2008, EMC wrote a technical note specifying that software layered like Snapview was not supported with AIX MPIO.

Technote: 300-008-486_aix_native_mpio_clariion_1108

Today, this technote has disappeared but EMC support write a EMC primus "emc75601" specify that it is still not supported with VNX software layered.

Driver Example :

EMC.CLARiiON.aix.rte       5.3.0.8    C     F    EMC CLARiiON AIX Support
EMC.CLARiiON.fcp.MPIO.rte  5.3.0.8    C     F    EMC CLARiiON FCP MPIO Support
devices.common.IBM.mpio.rte 6.1.7.15    C     F    MPIO Disk Path Control Module

EMC primus case "emc75601"

VNX storage-system layered applications
EMC Layered software such as SnapView, MirrorView/Asynchronous, MirrorView/Synchronous,
EMC SAN Copy, etc., are not supported with hosts running AIX Native MPIO

So if it is imperative to use Snapview, then install EMC PowerPath.