This is a procedure how to replace a failing HBA or fibre channel adapter, when used in combination with SDD storage:
- Determine which adapter is failing (0, 1, 2, etcetera):
# datapath query adapter
- Check if there are dead paths for any vpaths:
# datapath query device
- Try to set a "degraded" adapter back to online using:
# datapath set adapter 1 offline
(that is, if adapter "1" is failing, replace it with the correct adapter number).
# datapath set adapter 1 online - If the adapter is still in a "degraded" status, open a call with IBM. They most likely require you to take a snap from the system, and send the snap file to IBM for them to analyze and they will conclude if the adapter needs to be replaced or not.
- Involve the SAN storage team if the adapter needs to be replaced. They will have to update the WWN of the failing adapter when the adapter is replaced for a new one with a new WWN.
- If the adapter needs to be replaced, wait for the IBM CE to be onsite with the new HBA adapter. Note the new WWN and supply that to the SAN storage team.
- Remove the adapter:
# datapath remove adapter 1
(replace the "1" with the correct adapter that is failing). - Check if the vpaths now all have one less path:
# datapath query device | more
- De-configure the adapter (this will also de-configure all the child devices, so you won't have to do this manually), by running: diag, choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Unconfigure a Device. Select the correct adapter, e.g. fcs1, set "Unconfigure any Child Devices" to "yes", and "KEEP definition in database" to "no". Hit ENTER.
- Replace the adapter: Run diag and choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Replace/Remove a PCI Hot Plug Adapter. Choose the correct device (be careful, you won't see the adapter name here, but only "Unknown", because the device was unconfigured).
- Have the IBM CE replace the adapter.
- Close any events on the failing adapter on the HMC.
- Validate that the notification LED is now off on the system, if not, go back into diag, choose Task Selection, Hot Plug Task, PCI Hot Plug Manager, and Disable the attention LED.
- Check the adapter firmware level using:
# lscfg -vl fcs1
(replace this with the actual adapter name).
And if required, update the adapter firmware microcode. Validate if the adapter is still functioning correctly by running:# errpt
# lsdev -Cc adapter - Have the SAN admin update the WWN.
- Run:
# cfgmgr -S
- Check the adapter and the child devices:
# lsdev -Cc adapter# lsdev -p fcs1
(replace this with the correct adapter name).
# lsdev -p fscsi1 - Add the paths to the device:
# addpaths
- Check if the vpaths have all paths again:
# datapath query device | more
Topics: EMC, SAN, Storage, System Admin↑
Recovering from dead EMC paths
If you run:
# powermt display dev=allAnd you notice that there are "dead" paths, then these are the commands to run in order to set these paths back to "alive" again, of course, AFTER ensuring that any SAN related issues are resolved.
To have PowerPath scan all devices and mark any dead devices as alive, if it finds that a device is in fact capable of doing I/O commands, run:
# powermt restoreTo delete any dead paths, and to reconfigure them again:
# powermt resetOr you could run:
# powermt config
# powermt check
From powerlink.emc.com:
- Before making any changes, collect host logs to document the current configuration. At a minimum, save the following: inq, lsdev -Cc disk, lsdev -Cc adapter, lspv, and lsvg
- Shutdown the application(s), unmount the file system(s), and varyoff all volume groups except for rootvg. Do not export the volume groups.
# varyoffvg <vg_name>
Check with lsvg -o (confirm that only rootvg is varied on)
If no PowerPath, skip all steps with power names. - For CLARiiON configuration, if Navisphere Agent is running, stop it:
# /etc/rc.agent stop
- Remove paths from Powerpath configuration:
# powermt remove hba=all
- Delete all hdiskpower devices:
# lsdev -Cc disk -Fname | grep power | xargs -n1 rmdev -dl
- Remove the PowerPath driver instance:
# rmdev -dl powerpath0
- Delete all hdisk devices:
For Symmetrix devices, use this command:# lsdev -CtSYMM* -Fname | xargs -n1 rmdev -dl
For CLARiiON devices, use this command:# lsdev -CtCLAR* -Fname | xargs -n1 rmdev -dl
- Confirm with lsdev -Cc disk that there are no EMC hdisks or hdiskpowers.
- Remove all Fiber driver instances:
# rmdev -Rdl fscsiX
(X being driver instance number, i.e. 0,1,2, etc.) - Verify through lsdev -Cc driver that there are no more fiber driver instances (fscsi).
- Change the adapter instances in Defined state
# rmdev -l fcsX
(X being adapter instance number, i.e. 0,1,2, etc.) - Create the hdisk entries for all EMC devices:
# emc_cfgmgr
or# cfgmgr -vl fcsx
(x being each adapter instance which was rebuilt). Skip this part if no PowerPath. - Configure all EMC devices into PowerPath:
# powermt config
- Check the system to see if it now displays correctly:
# powermt display
# powermt display dev=all
# lsdev -Cc disk
# /etc/rc.agent start
An easy way to see the status of your SAN devices is by using the following command:
# powermt displayTo get more information on the disks, use:
Symmetrix logical device count=6
CLARiiON logical device count=0
Hitachi logical device count=0
Invista logical device count=0
HP xp logical device count=0
Ess logical device count=0
HP HSx logical device count=0
==============================================================
- Host Bus Adapters - --- I/O Paths ---- ------ Stats ------
### HW Path Summary Total Dead IO/Sec Q-IOs Errors
==============================================================
0 fscsi0 optimal 6 0 - 0 0
1 fscsi1 optimal 6 0 - 0 0
# powermt display dev=all
It will sometimes occur that a file system reports storage to be in use, while you're unable to find which file exactly is using that storage. This may occur when a process has used disk storage, and is still holding on to it, without the file actually being there anymore for whatever reason.
A good way to resolve such an issue, is to reboot the server. This way, you'll be sure the process is killed, and the disk storage space is released. However, if you don't want to use such drastic measures, here's a little script that may help you trying to find the process that may be responsible for an inode without a filename. Make sure you have lsof installed on your server.
#!/usr/bin/ksh
# Make sure to enter a file system to scan
# as the first attribute to this script.
FILESYSTEM=$1
LSOF=/usr/sbin/lsof
# A for loop to get a list of all open inodes
# in the filesystem using lsof.
for i in `$LSOF -Fi $FILESYSTEM | grep ^i | sed s/i//g` ; do
# Use find to list associated inode filenames.
if [ `find $FILESYSTEM -inum $i` ] ; then
echo > /dev/null
else
# If filename cannot be found,
# then it is a suspect and check lsof output for this inode.
echo Inode $i does not have an associated filename:
$LSOF $FILESYSTEM | grep -e $i -e COMMAND
fi
done
Check the relation between vpaths and hdisks:
# lsvpcfgCheck the status of the adapters according to SDD:
# datapath query adapterCheck on stale partitions:
# lsvg -o | lsvg -i | grep -i stale
Topics: PowerHA / HACMP, SAN, SDD, Storage↑
Reservation bit
If you wish to get rid of the SCSI disk reservation bit on SCSI, SSA and VPATH devices, there are two ways of achieving this:
Firstly, HACMP comes along with some binaries that do this job:
# /usr/es/sbin/cluster/utilities/cl_SCSIdiskreset /dev/vpathxSecondly, there is a little (not official) IBM binary tool called "lquerypr". This command is part of the SDD driver fileset. It can also release the persistant reservation bit and clear all reservations:
First check if you have any reservations on the vpath:
# lquerypr -vh /dev/vpathxClear it as follows:
# lquerypr -ch /dev/vpathxIn case this doesn't work, try the following sequence of commands:
If you'd like to see more information about lquerypr, simply run lquerypr without any options, and it will display extensive usage information.# lquerypr -ch /dev/vpathx # lquerypr -rh /dev/vpathx # lquerypr -ph /dev/vpathx
For SDD, you should be able to use the following command to clear the persistant reservation:
# lquerypr -V -v -c /dev/vpathXXFor SDDPCM, use:
# pcmquerypr -V -v -c /dev/hdiskXX
Topics: Red Hat / Linux, SAN, Storage↑
Emulex hbanyware
If you have Emulex HBA''s and the hbanyware software installed, for example on Linux, then you can use the following commands to retrieve information about the HBA''s:
To run a GUI version:
# /usr/sbin/hbanyware/hbanywareTo run the command-line verion:
# /usr/sbin/hbanyware/hbacmd listhbasTo get for attributes about a specific HBA:
# /usr/sbin/hbanyware/hbacmd listhbas 10:00:00:00:c9:6c:9f:d0
SAN storage places the physical disk outside a computer system. It is now connected to a Storage Area Network (SAN). In a Storage Area Network, storage is offered to many systems, including AIX systems. This is done via logical blocks of disk space (LUNs). In the case of an AIX system, every SAN disk is seen as a seperate hdisk, with the advantage of easily expanding the AIX system with new SAN disks, avoiding buying and installing new physical hard disks.

Other advantages of SAN:
- Disk storage is no longer limited to the space in the computer system itself or the amount of available disk slots.
- After the initial investment in the SAN network and storage, the costs of storage per gigabyte are less than disk space within the computer systems.
- Using two different SAN networks (fabrics), you can avoid having disruptions in your storage, the same as mirroring your data on separate disks. The two SAN fabrics should not be connected to each other.
- Using two seperate, geographically dispersed storage systems (e.g. ESS), a disruption in a computer center will not cause your computer systems to go down.
- When you place to SAN network adapters (called Host Bay adapters on Fibre Channel or HBA) in every computer system, you can connect your AIX system to two different fabrics, thus increasing the availability of the storage. Also, you'll be able to load balance the disk storage over these two host bay adapters. You'll need Multipath I/O software (e.g. SDD or PowerPath) for this to work.
- By using 2 HBAs, a defect in a single HBA will not cause downtime.
- AIX systems are able to boot from SAN disks.
If you have any VMWare images, where you made the disk size a little too small, then fortunately in VMWare Workstation you can change the size of a disk with a simple command line program. Sadly the command only makes your drive bigger not the actual partition. And especially Windows won't allow you to resize the partition where the Windows binaries are installed. So how can you get around that?
First, create a copy of your vmdk file to somewhere else, should the next action fail for some reason.
Then resize the disk to the required size:
# vmware-vdiskmanager -x 8GB myDisk.vmdkYou need to have plenty of disk space free to do this operation, as your vmdk file will be copied by vmware-vdiskmanager. BTW, this command may take a while, depending on the size of your vmdk file.
Now get the ISO image of System Rescue CD-ROM and set the VMWare session to boot of the ISO image. Then, run QTParted. You can do this by starting this CD-ROM with a framebuffer (press F2 at start) and then run run_qtparted as soon as Linux has started. Select the windows drive partition with the right mouse button and choose resize. Set the new size and commit the change. Then exit from QTParted and from Linux (init 0). Remove the ISO image from the VMWare session and restart VMWare to normally start Windows. Windows will detect the disk change and force a chk_disk to run. Once Windows has started, the new disk size is present.


