UNIX Health Check

Tech Blog

These are blog entries written by the UNIX Health Check development team. Our team has extensive technical experience on both AIX and Red Hat systems, and we like to share our knowledge with our visitors.

Topics: Hardware, SAN, SDD, Storage

How-to replace a failing HBA using SDD storage

This is a procedure how to replace a failing HBA or fibre channel adapter, when used in combination with SDD storage:

Determine which adapter is failing (0, 1, 2, etcetera):
# datapath query adapter
Check if there are dead paths for any vpaths:
# datapath query device
Try to set a "degraded" adapter back to online using:
# datapath set adapter 1 offline
# datapath set adapter 1 online
(that is, if adapter "1" is failing, replace it with the correct adapter number).
If the adapter is still in a "degraded" status, open a call with IBM. They most likely require you to take a snap from the system, and send the snap file to IBM for them to analyze and they will conclude if the adapter needs to be replaced or not.
Involve the SAN storage team if the adapter needs to be replaced. They will have to update the WWN of the failing adapter when the adapter is replaced for a new one with a new WWN.
If the adapter needs to be replaced, wait for the IBM CE to be onsite with the new HBA adapter. Note the new WWN and supply that to the SAN storage team.
Remove the adapter:
# datapath remove adapter 1
(replace the "1" with the correct adapter that is failing).
Check if the vpaths now all have one less path:
# datapath query device | more
De-configure the adapter (this will also de-configure all the child devices, so you won't have to do this manually), by running: diag, choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Unconfigure a Device. Select the correct adapter, e.g. fcs1, set "Unconfigure any Child Devices" to "yes", and "KEEP definition in database" to "no". Hit ENTER.
Replace the adapter: Run diag and choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Replace/Remove a PCI Hot Plug Adapter. Choose the correct device (be careful, you won't see the adapter name here, but only "Unknown", because the device was unconfigured).
Have the IBM CE replace the adapter.
Close any events on the failing adapter on the HMC.
Validate that the notification LED is now off on the system, if not, go back into diag, choose Task Selection, Hot Plug Task, PCI Hot Plug Manager, and Disable the attention LED.
Check the adapter firmware level using:
# lscfg -vl fcs1
(replace this with the actual adapter name).

And if required, update the adapter firmware microcode. Validate if the adapter is still functioning correctly by running:
# errpt
# lsdev -Cc adapter
Have the SAN admin update the WWN.
Run:
# cfgmgr -S
Check the adapter and the child devices:
# lsdev -Cc adapter
# lsdev -p fcs1
# lsdev -p fscsi1
(replace this with the correct adapter name).
Add the paths to the device:
# addpaths
Check if the vpaths have all paths again:
# datapath query device | more

Topics: EMC, SAN, Storage, System Admin ↑

Recovering from dead EMC paths

If you run:

# powermt display dev=all

And you notice that there are "dead" paths, then these are the commands to run in order to set these paths back to "alive" again, of course, AFTER ensuring that any SAN related issues are resolved.

To have PowerPath scan all devices and mark any dead devices as alive, if it finds that a device is in fact capable of doing I/O commands, run:

# powermt restore

To delete any dead paths, and to reconfigure them again:

# powermt reset
# powermt config

Or you could run:

# powermt check

Topics: EMC, Installation, ODM, SAN, Storage ↑

How to cleanup AIX EMC ODM definitions

From powerlink.emc.com:

Before making any changes, collect host logs to document the current configuration. At a minimum, save the following: inq, lsdev -Cc disk, lsdev -Cc adapter, lspv, and lsvg
Shutdown the application(s), unmount the file system(s), and varyoff all volume groups except for rootvg. Do not export the volume groups.
# varyoffvg <vg_name>
Check with lsvg -o (confirm that only rootvg is varied on)
If no PowerPath, skip all steps with power names.
For CLARiiON configuration, if Navisphere Agent is running, stop it:
# /etc/rc.agent stop
Remove paths from Powerpath configuration:
# powermt remove hba=all
Delete all hdiskpower devices:
# lsdev -Cc disk -Fname | grep power | xargs -n1 rmdev -dl
Remove the PowerPath driver instance:
# rmdev -dl powerpath0
Delete all hdisk devices:
For Symmetrix devices, use this command:
# lsdev -CtSYMM* -Fname | xargs -n1 rmdev -dl
For CLARiiON devices, use this command:
# lsdev -CtCLAR* -Fname | xargs -n1 rmdev -dl
Confirm with lsdev -Cc disk that there are no EMC hdisks or hdiskpowers.
Remove all Fiber driver instances:
# rmdev -Rdl fscsiX
(X being driver instance number, i.e. 0,1,2, etc.)
Verify through lsdev -Cc driver that there are no more fiber driver instances (fscsi).
Change the adapter instances in Defined state
# rmdev -l fcsX
(X being adapter instance number, i.e. 0,1,2, etc.)
Create the hdisk entries for all EMC devices:
# emc_cfgmgr
or
# cfgmgr -vl fcsx
(x being each adapter instance which was rebuilt). Skip this part if no PowerPath.
Configure all EMC devices into PowerPath:
# powermt config
Check the system to see if it now displays correctly:
# powermt display
# powermt display dev=all
# lsdev -Cc disk
# /etc/rc.agent start

Topics: EMC, SAN, Storage ↑

Display the status of EMC SAN devices

An easy way to see the status of your SAN devices is by using the following command:

# powermt display
Symmetrix logical device count=6
CLARiiON logical device count=0
Hitachi logical device count=0
Invista logical device count=0
HP xp logical device count=0
Ess logical device count=0
HP HSx logical device count=0
==============================================================
- Host Bus Adapters -  --- I/O Paths ----  ------ Stats ------
### HW Path            Summary Total Dead  IO/Sec Q-IOs Errors
==============================================================
  0 fscsi0             optimal     6    0       -     0      0
  1 fscsi1             optimal     6    0       -     0      0

To get more information on the disks, use:

# powermt display dev=all

Topics: Storage, System Admin ↑

Inodes without filenames

It will sometimes occur that a file system reports storage to be in use, while you're unable to find which file exactly is using that storage. This may occur when a process has used disk storage, and is still holding on to it, without the file actually being there anymore for whatever reason.

A good way to resolve such an issue, is to reboot the server. This way, you'll be sure the process is killed, and the disk storage space is released. However, if you don't want to use such drastic measures, here's a little script that may help you trying to find the process that may be responsible for an inode without a filename. Make sure you have lsof installed on your server.

#!/usr/bin/ksh

# Make sure to enter a file system to scan
# as the first attribute to this script.
FILESYSTEM=$1
LSOF=/usr/sbin/lsof

# A for loop to get a list of all open inodes
# in the filesystem using lsof.
for i in `$LSOF -Fi $FILESYSTEM | grep ^i | sed s/i//g` ; do
# Use find to list associated inode filenames.
if [ `find $FILESYSTEM -inum $i` ] ; then
echo > /dev/null
else
# If filename cannot be found,
# then it is a suspect and check lsof output for this inode.
echo Inode $i does not have an associated filename:
$LSOF $FILESYSTEM | grep -e $i -e COMMAND
fi
done

Topics: SAN, SDD, Storage ↑

Vpath commands

Check the relation between vpaths and hdisks:

# lsvpcfg

Check the status of the adapters according to SDD:

# datapath query adapter

Check on stale partitions:

# lsvg -o | lsvg -i | grep -i stale

Topics: PowerHA / HACMP, SAN, SDD, Storage ↑

Reservation bit

If you wish to get rid of the SCSI disk reservation bit on SCSI, SSA and VPATH devices, there are two ways of achieving this:

Firstly, HACMP comes along with some binaries that do this job:

# /usr/es/sbin/cluster/utilities/cl_SCSIdiskreset /dev/vpathx

Secondly, there is a little (not official) IBM binary tool called "lquerypr". This command is part of the SDD driver fileset. It can also release the persistant reservation bit and clear all reservations:

First check if you have any reservations on the vpath:

# lquerypr -vh /dev/vpathx

Clear it as follows:

# lquerypr -ch /dev/vpathx

In case this doesn't work, try the following sequence of commands:

# lquerypr -ch /dev/vpathx
# lquerypr -rh /dev/vpathx
# lquerypr -ph /dev/vpathx

If you'd like to see more information about lquerypr, simply run lquerypr without any options, and it will display extensive usage information.

For SDD, you should be able to use the following command to clear the persistant reservation:

# lquerypr -V -v -c /dev/vpathXX

For SDDPCM, use:

# pcmquerypr -V -v -c /dev/hdiskXX

Topics: Red Hat / Linux, SAN, Storage ↑

Emulex hbanyware

If you have Emulex HBA''s and the hbanyware software installed, for example on Linux, then you can use the following commands to retrieve information about the HBA''s:

To run a GUI version:

# /usr/sbin/hbanyware/hbanyware

To run the command-line verion:

# /usr/sbin/hbanyware/hbacmd listhbas

To get for attributes about a specific HBA:

# /usr/sbin/hbanyware/hbacmd listhbas 10:00:00:00:c9:6c:9f:d0

Topics: SAN, Storage ↑

SAN introduction

SAN storage places the physical disk outside a computer system. It is now connected to a Storage Area Network (SAN). In a Storage Area Network, storage is offered to many systems, including AIX systems. This is done via logical blocks of disk space (LUNs). In the case of an AIX system, every SAN disk is seen as a seperate hdisk, with the advantage of easily expanding the AIX system with new SAN disks, avoiding buying and installing new physical hard disks.

Other advantages of SAN:

Disk storage is no longer limited to the space in the computer system itself or the amount of available disk slots.
After the initial investment in the SAN network and storage, the costs of storage per gigabyte are less than disk space within the computer systems.
Using two different SAN networks (fabrics), you can avoid having disruptions in your storage, the same as mirroring your data on separate disks. The two SAN fabrics should not be connected to each other.
Using two seperate, geographically dispersed storage systems (e.g. ESS), a disruption in a computer center will not cause your computer systems to go down.
When you place to SAN network adapters (called Host Bay adapters on Fibre Channel or HBA) in every computer system, you can connect your AIX system to two different fabrics, thus increasing the availability of the storage. Also, you'll be able to load balance the disk storage over these two host bay adapters. You'll need Multipath I/O software (e.g. SDD or PowerPath) for this to work.
By using 2 HBAs, a defect in a single HBA will not cause downtime.
AIX systems are able to boot from SAN disks.

Topics: Red Hat / Linux, Storage, VMWare ↑

Increasing the VMWare disk drive

If you have any VMWare images, where you made the disk size a little too small, then fortunately in VMWare Workstation you can change the size of a disk with a simple command line program. Sadly the command only makes your drive bigger not the actual partition. And especially Windows won't allow you to resize the partition where the Windows binaries are installed. So how can you get around that?

First, create a copy of your vmdk file to somewhere else, should the next action fail for some reason.

Then resize the disk to the required size:

# vmware-vdiskmanager -x 8GB myDisk.vmdk

You need to have plenty of disk space free to do this operation, as your vmdk file will be copied by vmware-vdiskmanager. BTW, this command may take a while, depending on the size of your vmdk file.

Now get the ISO image of System Rescue CD-ROM and set the VMWare session to boot of the ISO image. Then, run QTParted. You can do this by starting this CD-ROM with a framebuffer (press F2 at start) and then run run_qtparted as soon as Linux has started. Select the windows drive partition with the right mouse button and choose resize. Set the new size and commit the change. Then exit from QTParted and from Linux (init 0). Remove the ISO image from the VMWare session and restart VMWare to normally start Windows. Windows will detect the disk change and force a chk_disk to run. Once Windows has started, the new disk size is present.

Number of results found for topic Storage: 51.
Displaying results: 41 - 50.

Order

No time to lose? Need to know what's wrong with
your UNIX system now? Then get started TODAY!