unixadmin.free.fr just another IBM blog and technotes backup

10fév/16

How to check memory and core activated via CUoD Activation Code

Go to IBM capacity on demand, enter type and serial number and check POD and MOD lines.

Ex 1: model 9117 type MMD
POD 53C1340827291F44AAF4000000040041E4 09/27/2015
AAF4 = CCIN = 4.228 GHz core
04 = 4 core activated

MOD 2A2A7F64BEEEC606821200000032004187
8212 = Feature code = Activation of 1 GB
32 = 32 GB activated

Ex 2: model 8233
POD 80FF07034C0917FA771400000016004166 09/17/2010
7714 = Feature code = 3.0 GHz core
16 = 16 core activated

Source :
Thank's to Mr Delmas
for CCIN reference check IBM Knowledge Center
for Feature code reference check IBM sales manual

14déc/15Off

HMC Save Upgrade Data failed

If you want to upgrade HMC and Saves Hardware Management Console (HMC) upgrade data failed with HSCSAVEUPGRDATA_ERROR, then check if the home directory of hscroot or other hmcsuperadmin are filled with Virtual I/O server ISO images. The filesystem (/mnt/upgrade) is used to store save upgrade data backup and it is to small to contains ISO images.

Fix: remove VIOS ISO images from HMC and relauch saveupgdata command.

Remplis sous: HMC Commentaires
28nov/15Off

vio_daemon consuming high memory on VIOS

This saturday I update two dual VIOS from 2.2.2.1 to combinated 2.2.3.1 + 2.2.3.50 + 2.2.3.52 Fixpack.
one VIOS was using lot of memory (8GB of computational), svmon show that vio_daemon used 12 segments of application stack (it's a joke). In fact, the customer had modified /etc/security/limits with stack, data and rss unlimited for root. Solved by Setting default value and reboot VIOS. See IBM technote.

Question
Why is vio_daemon consuming high memory on PoweVM Virtual I/O Server (VIOS)?

Cause
There is a known issue in VIOS 2.2.3.0 thru 2.2.3.3 with vio_daemon memory leak that was fixed at 2.2.3.4 with IV64508.

Answer
To check your VIOS level, as padmin, run:
$ ioslevel

If your VIOS level is 2.2.3.4 or higher, the problem may be due to having values in /etc/security/limits set to "unlimited" (-1). Particularly, the "stack" size setting, which exposes a condition where the system can be allowed to pin as much stack as desired causing vio_daemon to consume a lot of memory.

$ oem_setup_env

# vi /etc/security/limits ->check the default stanza

default:
        fsize = -1
        core = -1
        cpu = -1
        data = -1
        rss = -1
        stack = -1
        nofiles = -1

In some cases, the issue with vio_daemon consuming high memory is noticed after a VIOS update to 2.2.3.X. However, a VIOS update will NOT change these settings. It is strongly recommended not to modify these default values as doing so is known to cause unpredictable results. Below is an example of the default values:

default:
        fsize = 2097151
        core = 2097151
        cpu = -1
        data = 262144
        rss = 65536
        stack = 65536
        nofiles = 2000

To correct the problem change the settings back to "default" values. Then reboot the VIOS at your earliest convenience.

Note 1
If the stack size was added to the root and/or padmin stanzas with unlimited setting, it should be removed prior to rebooting the VIOS.

Note 2
If there clients are not redundant via a second VIOS, a maintenance window should be schedule to bring the clients down before rebooting the VIOS.

SOURCE: IBM technote

Taggé comme: Commentaires
23nov/15Off

Drive paths to library client taken offline when server option SANDISCOVERY set to ‘YES’

Technote (troubleshooting)

Problem(Abstract)

This message in the activity log of the library manager appears: ANR1772E The path from source to destination is taken offline.

Symptom

On the library client, these messages are observed in the activity log when a library sharing session is opened to the library manager:

ANR1926W A SCSI inquiry has timed out after 15 seconds.
ANR3626W A check condition occurred during a small computer system interface (SCSI) inquiry at Fibre Channel port WWN=<wwn_number> , KEY=00, ASC=00, ASCQ=00.
ANR1786W HBAAPI not able to get adapter name.
ANR8963E Unable to find path to match the serial number defined for drive <DRIVE_NAME> in library <LIBRARY_NAME>.
ANR8873E The path from source <library_client> to destination <drive> (/dev/rmtXYZ) is taken offline.

On the library manager, you can see this corresponding message showing the path to the drive being taken offline:

ANR1772E The path from source to destination <drive> is taken offline.

Cause

The SAN discovery's query of the HBA has timed out, and the path is taken offline. This can occur in SAN environments with a large number of devices.
Diagnosing the problem

Verify that there is not an underlying hardware problem causing the drives paths to go offline.

Check the value of the SANDISCOVERYTIMEOUT option on the library clients and the library manager. The default value is 15 seconds:

QUERY OPTION SANDISCOVERYTIMEOUT

Resolving the problem

If the value of the option is at or near the default value of 15 seconds, increase to a greater number. For example:

SETOPT SANDISCOVERYTIMEOUT 300
Remplis sous: TSM Commentaires
17nov/15Off

Why Are Tapes with PRIVATE Status Not Found in QUERY VOLUME Output?

Technote (FAQ)

Question

QUERY LIBVOLUME shows tape volumes with status of PRIVATE, but the same volumes do not show up with the command: Q VOL

Why are these tapes PRIVATE?

Answer

QUERY VOLUME will only return information about volumes that belong to stgpools, but there are other types of volumes that can have valid data on them: DB backups, exports, backupsets and remote volumes that belong to a Library Client server.

The volume history will keep a record of all volumes and you can display these other types of non-stgpool volumes with the following commands:

q volh type=dbb
q volh type=dbs
q volh type=export
q volh type=backupset
q volh type=remote

If a PRIVATE volume is not part of a stgpool and does not display in any of the above Q VOLH commands then you can set it to scratch using the command:

UPDATE LIBVOL <library_name> <vol_name> STATUS=SCRATCH

If you do have a library sharing environment it is recommended to run an AUDIT LIBRARY on the Library Client servers prior to changing the status of a volume to scratch on the Library Manager server.

Remplis sous: TSM Commentaires
30oct/15

Using ‘dd’ to verify Tivoli Storage Manager tape volume labels

Question
How can I use the Unix 'dd' command to verify a tape volume label?

Answer
The first step that may be necessary to verify a tape volume label is to find out the block size in use on that tape volume. This parameter is typically set on the physical tape library console interface and will vary between manufacturers so the ideal place to search is on the manufacturer website.
There is a method to manually find out the block size as follows:

On most Unix systems, the 'dd' command will output a message indicating a read failed from a tape drive (and corresponding tape volume) along with insufficient memory message. For example on AIX:

    bash$ dd if=/dev/rmt1 of=/tmp/test.file ibs=32 count=1
    dd: 0511-051 The read failed.
    : There is not enough memory available now.
    0+0 records in.
    0+0 records out.

The 'if' parameter must reference a valid path to a drive that contains the volume you are seeking information about. This volume may be loaded using a utility such as tapeutil or directly from the physical library's console management. The drive should not be in use by Tivoli Storage Manager at the time the command is run and it is recommended to take the drive offline to Tivoli Storage Manager.

The 'ibs' parameter indicates the block size to use in bytes unless a 'k' is specific, in which case the parameter is read as kilobytes. A value of 32 bytes, as in the example above, is a good starting value. If this command returns a memory related error message then the value can be doubled.

    bash$ dd if=/dev/rmt1 of=/tmp/test.file ibs= 64 count=1
    dd: 0511-051 The read failed.
    : There is not enough memory available now.
    0+0 records in.
    0+0 records out.

The value specified for the block size is still smaller than what is actually on the volume so another error is generated. The value must be increased (each time doubling it) until no error message is reported:

    bash$ dd if=/dev/rmt1 of=/tmp/test.file ibs= 128 count=1
    0+1 records in.
    0+1 records out.

Once the correct block size has been discovered, the 'dd' command should not generate a memory error when reading from the volume.
Now that the block size is known, the data on the first block of the tape can be dumped to a file:
dd bs= conv=ascii if= of=/tmp/.out

For example:

    bash$ dd bs=128 conv=ascii skip=0 count=1 if=/dev/rmt1 of=/tmp/block1.out
    0+1 records in.
    0+1 records out.

Once the file is file '/tmp/block1.out' is written, the file may be viewed in any text editor or the cat command can be used:

    bash$ cat /tmp/block1.out
    VOL1200312

In this case the 'VOL1200312' is the label of the tape volume residing in the drive /dev/rmt1.

Source: IBM Technote

Remplis sous: TSM Aucun commentaire
27oct/15

NMON Visualizer

Pour ceux qui utilise l'outil de collecte de performance nmon développé par Nigel Griffiths et disponible pour IBM AIX/VIOS et Linux (Power, x86, x86_64, Mainframe & now ARM (Raspberry Pi)), comme vous le savez il faut se servir du fichier excel nmon analyser pour visualiser les fichiers de collecte nmon.

En complément de cet outil je vous conseille de tester NMONVisualizer, un projet IBM démarré par Hunter Presnall qui est un excelent outil pour comparer et analyser les fichiers nmon issue des collectes de performances de plusieurs systèmes ou VM AIX / Linux.

NMONVisualizer http://nmonvisualizer.github.io/nmonvisualizer/index.html

NMON
https://www.ibm.com/developerworks/aix/library/au-analyze_aix/

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power+Systems/page/nmon

http://nmon.sourceforge.net

Nmon analyser
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power%20Systems/page/nmon_analyser

My thanks to you all for a job very well done. :)

13oct/15

How to add a hardware error in AIX errlog

Comment générer des erreurs hardware dans l'errlog d'AIX. Utile pour tester des logiciels de supervision ou la gestion des évènements sous PowerHA.
Voir le fichier /usr/include/sys/errids.h pour les LABELS errpt.

echo "EPOW_SUS\nEMULATE\n1\ntexte1\ntexte2" | /usr/lib/ras/ras_logger
echo "SCSI_ERR1\nEMULATE\n1\ntexte1\ntexte2" | /usr/lib/ras/ras_logger
# errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
0502F666   1013162315 P H EMULATE        ADAPTER ERROR
74533D1A   1013162115 U H EMULATE        LOSS OF ELECTRICAL POWER
Remplis sous: AIX, HACMP Aucun commentaire
8oct/15

Virtual HMC for November 2015

The IBM Power Systems Hardware Management Console (HMC) virtual appliance can be used to manage any of the systems that are supported by the version 8 HMC, which includes Power Systems servers with IBM POWER6, POWER7, and POWER8 processors.

The Power Systems HMC virtual appliance offers these benefits:

Provides hardware, service, and basic virtualization management for your Power Systems servers
Offers the same functionality as the traditional HMC
Runs as a virtual machine on an x86 server virtualized either by VMware ESXi or Red Hat KVM

Source: IBM

Remplis sous: HMC Aucun commentaire
25sept/15

AIX Crash always on d_map_list_tce()

Problem

AIX 6100-04-05-1015 Crash always on d_map_list_tce() with with EMC Symmetrix FCP VRAID storage disk.

EMC.Symmetrix.aix.rte:5.3.0.5
EMC.Symmetrix.fcp.rte:5.3.0.5
EMCpower.base:5.3.1.1
EMCpower.encryption:5.3.1.1
EMCpower.migration_enabler:5.3.1.1
EMCpower.mpx:5.3.1.1

(3)> stack
pvthread+001B00 STACK:
[041C85A0]d_map_list_tce+000A40 (0000000000000000, 0000000000000000,F1000A0380145AF0, 0000000000001020, 0000000000000001 [??])
[04255138]04255138 ()
[0425A668]efc_start+000508 (??)
[04226D58]efc_intr+0001B8 (??)
[0024280C]i_poll_soft+00012C (??)
[00242180]i_softmod+000480 ()
[0013FB44]flih_util+000250 ()
 
(3)> dr iar
iar   : 00000000041C85A0
041C85A0      tweqi    r3,0                r3=0
 
(3)> lke 041C85A0
    ADDRESS          FILE             FILESIZE FLAGS    MODULE NAME
 
  1 F1000000A05F2500 041B0000 00030000 00080252 /usr/lib/drivers/pci/pci_busdd
 
(3)> symptom
Instruction:
PIDS/5765G6200 LVLS/610 PCSS/SPI1 MS/700 FLDS/d_map_lis VALU/e8610110 FLDS/04255138 VALU/342e6b64

Local Fix
Stop the replication between storage and DR site

IBM APAR
Not yet tested with replication

Remplis sous: AIX Aucun commentaire