unixadmin.free.fr just another IBM blog and technotes backup

25nov/16Off

Shared Storage Pool node not a member of any cluster.

a customer needed to restart two old nodes cluster from VIOS Shared Storage Pool, but the second node never join the cluster. and storage pool was only active on first node.

The vty console display error : 0967-043 This node is currently not a member of any cluster.

I check console log and one line has oriented my diagnostic.

$ r oem
# alog -ot console
0 Fri Nov 25 15:59:08 CET 2016 0967-112 Subsystem not configured.
0 Fri Nov 25 16:01:07 CET 2016 0967-043 This node is currently not a member of any cluster.
0 Fri Nov 25 16:03:10 CET 2016 0967-043 This node is currently not a member of any cluster.
0 Fri Nov 25 16:03:17 CET 2016 JVMJ9VM090I Slow response to network query (150 secs), check your IP DNS configuration

I check /etc/resolv.conf and found that the DNS entry was not reachable. I change DNS entry with first node DNS configuration, restart second node and it join the cluster normally.

Taggé comme: Commentaires
16nov/16

emgr failure with noclobber ksh option

A customer never can apply a emergency fix with a lot of "file already exists" error.
The cause was noclobber ksh option was set in the ~/.kshrc of root user.

Checking space requirements ...
/usr/sbin/emgr[11177]: /tmp/emgrwork/10617068/.epkg.msg.buf.10617068: file already exists
/usr/ccs/lib/libc.a:
emgr: 0645-007 ATTENTION: /usr/sbin/fuser -xf returned an unexpected result.
/usr/sbin/emgr[11177]: /tmp/emgrwork/10617068/.global.warning.10617068: file already exists
emgr: 0645-007 ATTENTION: inc_global_warning() returned an unexpected result.
emgr: 0645-007 ATTENTION: isfopen() returned an unexpected result.
/usr/sbin/emgr[11177]: /tmp/emgrwork/10617068/.global.warning.10617068: file already exists
emgr: 0645-007 ATTENTION: inc_global_warning() returned an unexpected result.
/usr/sbin/emgr[11177]: /tmp/emgrwork/10617068/.epkg.msg.buf.10617068: file already exists
...
1 IV81303s1a INSTALL FAILURE

Workaround
check if noclobber ksh option is active.

set -o | grep noclobber

Remove noclobber option from ~/.kshrc or ~/.profile of root user and open a new terminal.

Disable noclobber:
set +o noclobber
Remplis sous: AIX Aucun commentaire
16nov/16Off

HMC V7R7.9.0 SP3 MH01659 « ssl_error_no_cypher_overlap »

On HMC V7R7.9.0 SP3, don't apply e-fix MH01659, it contains a lot of bugs.

If you really need MH01659, then apply e-fix MH01635 before. ( otherwise a ASM connetion timeout and blank page occur with IBM Power5).
=> MH01659.readme.html

Note: This package includes fixes for HMC Version 7 Release 7.9.0 Service Pack 3. You can reference this package by APAR MB04044 and PTF MH01659. This image must be installed on top of HMC Version 7 Release 7.9.0 Service Pack 3 (MH01546) with MH01635 installed.

MH01659 Impact - Known Issues :

After installing PTF MH01659 and the Welcome page loads on the local console, clicking "Log on and Launch" results in the following error:

Problem loading page
An error occurred during a connection to 127.0.0.1.

Cannot communicate securely with peer: no common encryption algorithm(s).
(Error code: ssl_error_no_cypher_overlap)

This defect also impacts the Power Enterprise Pool GUI when launched remotely. Ensure remote access is enabled and the HMC is accessible remotely for management prior to installing this PTF. A fix is planned for a future PTF.

Circumvention: From the HMC home page, Log on by clicking on the "Serviceable Events" link rather than the "Log on and launch the Hardware Management Console web application" link. The "System Status" and "Attention LEDs" links can also be used. Note that the Power Enterprise Pools (PEP) task will not be available from the local console. CLI or remote GUI can be used to perform PEP tasks.

A vterm console window cannot be opened by the GUI on the local HMC console. You can use the mkvterm or vtmenu command on the local HMC console or use the GUI remotely to open a vterm. A fix is planned for a future PTF.

ASM for POWER5 servers will launch a blank white screen and eventually a "Connection timed out" error if PTF MH01635 is not installed prior to MH01644 or MH01659. The install order and supersedes lists have been updated to include PTF MH01635 prior to installing either MH01644 or MH01659.

Remplis sous: HMC Commentaires
12oct/16Off

emgr secret flag for EFIXLOCKED

A customer have problem to update one AIX with EFIXLOCKED in some fileset. In normal situation fileset is locked by a emergency fix, and before apply update you must remove e-fix. With this customer no e-fix ... nothing to remove, and /usr/emgrdata was empty.

emgr command is a korn shell script that contains secret flag "-G" for unlock fileset with unlock_installp_pkg() function.

 G) # Secret flag. DO NOT USE THIS UNLESS YOU ARE SURE !
            emt || exit 1
            unlock_installp_pkg "$OPTARG" "0"; exit $?;; # secret flag

########################################################################
## Function: unlock_installp_pkg
## Parameters: <PKG> <EFIX NUM>
########################################################################

if no e-fix was installed, then use anything for second argument $2 like "unlock"

#  lslpp -qlc | grep EFIXLOCKED
/etc/objrepos:bos.rte.libc:7.1.4.0::COMMITTED:F:libc Library:EFIXLOCKED
/usr/lib/objrepos:bos.rte.libc:7.1.4.0::COMMITTED:F:libc Library:EFIXLOCKED

# emgr -G bos.rte.libc unlock    
Initializing emgr manual maintenance tools.
Explicit Lock: unlocking installp fileset bos.rte.libc.

Also -D flag display Debug mode with set -x.

20sept/16Off

AIX SCSI-2 Reservation on INFINIDAT

A customer encountered a problem with Disaster Recovery plan for AIX rootvg boot on SAN (reserve_policy=single_path) on Infinidat model F6130.

Problem: No boot disk on SMS menu.

Workaround: From the Infinidat Storage Unmap & Map rootvg LUN to the host.

Fix: Infinidat corrected Bug in Firmware 2.2.10.12 and add also a internal Script for SCSI-2 reservation.

Before the Fix Infinidat Storage managed only SCSI-3 reservation and AIX use SCSI-2 reservation.

Remplis sous: AIX Commentaires
12sept/16

Oracle RAC 11gR2 need multicast

After AIX / Oracle RAC migration to new Datacenter, the DBA encountered a problem to start Oracle RAC with network heartbeat error...
Root cause: Network Team has dropped multicast beetween Datacenter.

cat $GRID_HOME/log/grac2/cssd/ocssd.log
2016-09-11 21:15:55.296: [    CSSD][382113536]clssnmPollingThread: node 2, orac002 (1) at 90% heartbeat fatal, removal in 2.950 seconds, seedhbimpd 1
2016-09-11 21:15:56.126: [    CSSD][388421376]clssnmvDHBValidateNCopy: node 2, orac002, has a disk HB, but no network HB, DHB has rcfg 269544449, wrtcnt, 547793, LATS 2298

Workaround: Run tcpdump and grep multicast MAC on Oracle Interconnect Interface and send multicast address for Network Team. That's work better after.

root@orac002:/root #  tcpdump -en -i en0 |  grep "01:00:5e"

21:43:24.678611 76:82:b2:99:ac:0b > 01:00:5e:00:00:fb, ethertype IPv4 (0x0800), length 1202: 192.168.10.217.42424 > 224.0.0.251.42424: UDP, length 1160
21:43:24.678798 76:82:b2:99:ac:0b > 01:00:5e:00:01:00, ethertype IPv4 (0x0800), length 1202: 192.168.10.217.42424 > 230.0.1.0.42424: UDP, length 1160

Oracle Doc ID 1212703.1
Bug 9974223 : GRID INFRASTRUCTURE NEEDS MULTICAST COMMUNICATION ON 230.0.1.0 ADDRESSES WORKING

Remplis sous: AIX, ORACLE Aucun commentaire
10fév/16

How to check memory and core activated via CUoD Activation Code

Go to IBM capacity on demand, enter type and serial number and check POD and MOD lines.

Ex 1: model 9117 type MMD
POD 53C1340827291F44AAF4000000040041E4 09/27/2015
AAF4 = CCIN = 4.228 GHz core
04 = 4 core activated

MOD 2A2A7F64BEEEC606821200000032004187
8212 = Feature code = Activation of 1 GB
32 = 32 GB activated

Ex 2: model 8233
POD 80FF07034C0917FA771400000016004166 09/17/2010
7714 = Feature code = 3.0 GHz core
16 = 16 core activated

Source :
Thank's to Mr Delmas
for CCIN reference check IBM Knowledge Center
for Feature code reference check IBM sales manual

14déc/15Off

HMC Save Upgrade Data failed

If you want to upgrade HMC and Saves Hardware Management Console (HMC) upgrade data failed with HSCSAVEUPGRDATA_ERROR, then check if the home directory of hscroot or other hmcsuperadmin are filled with Virtual I/O server ISO images. The filesystem (/mnt/upgrade) is used to store save upgrade data backup and it is to small to contains ISO images.

Fix: remove VIOS ISO images from HMC and relauch saveupgdata command.

Remplis sous: HMC Commentaires
28nov/15Off

vio_daemon consuming high memory on VIOS

This saturday I update two dual VIOS from 2.2.2.1 to combinated 2.2.3.1 + 2.2.3.50 + 2.2.3.52 Fixpack.
one VIOS was using lot of memory (8GB of computational), svmon show that vio_daemon used 12 segments of application stack (it's a joke). In fact, the customer had modified /etc/security/limits with stack, data and rss unlimited for root. Solved by Setting default value and reboot VIOS. See IBM technote.

Question
Why is vio_daemon consuming high memory on PoweVM Virtual I/O Server (VIOS)?

Cause
There is a known issue in VIOS 2.2.3.0 thru 2.2.3.3 with vio_daemon memory leak that was fixed at 2.2.3.4 with IV64508.

Answer
To check your VIOS level, as padmin, run:
$ ioslevel

If your VIOS level is 2.2.3.4 or higher, the problem may be due to having values in /etc/security/limits set to "unlimited" (-1). Particularly, the "stack" size setting, which exposes a condition where the system can be allowed to pin as much stack as desired causing vio_daemon to consume a lot of memory.

$ oem_setup_env

# vi /etc/security/limits ->check the default stanza

default:
        fsize = -1
        core = -1
        cpu = -1
        data = -1
        rss = -1
        stack = -1
        nofiles = -1

In some cases, the issue with vio_daemon consuming high memory is noticed after a VIOS update to 2.2.3.X. However, a VIOS update will NOT change these settings. It is strongly recommended not to modify these default values as doing so is known to cause unpredictable results. Below is an example of the default values:

default:
        fsize = 2097151
        core = 2097151
        cpu = -1
        data = 262144
        rss = 65536
        stack = 65536
        nofiles = 2000

To correct the problem change the settings back to "default" values. Then reboot the VIOS at your earliest convenience.

Note 1
If the stack size was added to the root and/or padmin stanzas with unlimited setting, it should be removed prior to rebooting the VIOS.

Note 2
If there clients are not redundant via a second VIOS, a maintenance window should be schedule to bring the clients down before rebooting the VIOS.

SOURCE: IBM technote

Taggé comme: Commentaires
23nov/15Off

Drive paths to library client taken offline when server option SANDISCOVERY set to ‘YES’

Technote (troubleshooting)

Problem(Abstract)

This message in the activity log of the library manager appears: ANR1772E The path from source to destination is taken offline.

Symptom

On the library client, these messages are observed in the activity log when a library sharing session is opened to the library manager:

ANR1926W A SCSI inquiry has timed out after 15 seconds.
ANR3626W A check condition occurred during a small computer system interface (SCSI) inquiry at Fibre Channel port WWN=<wwn_number> , KEY=00, ASC=00, ASCQ=00.
ANR1786W HBAAPI not able to get adapter name.
ANR8963E Unable to find path to match the serial number defined for drive <DRIVE_NAME> in library <LIBRARY_NAME>.
ANR8873E The path from source <library_client> to destination <drive> (/dev/rmtXYZ) is taken offline.

On the library manager, you can see this corresponding message showing the path to the drive being taken offline:

ANR1772E The path from source to destination <drive> is taken offline.

Cause

The SAN discovery's query of the HBA has timed out, and the path is taken offline. This can occur in SAN environments with a large number of devices.
Diagnosing the problem

Verify that there is not an underlying hardware problem causing the drives paths to go offline.

Check the value of the SANDISCOVERYTIMEOUT option on the library clients and the library manager. The default value is 15 seconds:

QUERY OPTION SANDISCOVERYTIMEOUT

Resolving the problem

If the value of the option is at or near the default value of 15 seconds, increase to a greater number. For example:

SETOPT SANDISCOVERYTIMEOUT 300
Remplis sous: TSM Commentaires