Tech Blog

These are blog entries written by the UNIX Health Check development team. Our team has extensive technical experience on both AIX and Red Hat systems, and we like to share our knowledge with our visitors.

Topics: Hardware, SAN, SDD, Storage

How-to replace a failing HBA using SDD storage

This is a procedure how to replace a failing HBA or fibre channel adapter, when used in combination with SDD storage:

  • Determine which adapter is failing (0, 1, 2, etcetera):
    # datapath query adapter
  • Check if there are dead paths for any vpaths:
    # datapath query device
  • Try to set a "degraded" adapter back to online using:
    # datapath set adapter 1 offline
    # datapath set adapter 1 online
    (that is, if adapter "1" is failing, replace it with the correct adapter number).
  • If the adapter is still in a "degraded" status, open a call with IBM. They most likely require you to take a snap from the system, and send the snap file to IBM for them to analyze and they will conclude if the adapter needs to be replaced or not.
  • Involve the SAN storage team if the adapter needs to be replaced. They will have to update the WWN of the failing adapter when the adapter is replaced for a new one with a new WWN.
  • If the adapter needs to be replaced, wait for the IBM CE to be onsite with the new HBA adapter. Note the new WWN and supply that to the SAN storage team.
  • Remove the adapter:
    # datapath remove adapter 1
    (replace the "1" with the correct adapter that is failing).
  • Check if the vpaths now all have one less path:
    # datapath query device | more
  • De-configure the adapter (this will also de-configure all the child devices, so you won't have to do this manually), by running: diag, choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Unconfigure a Device. Select the correct adapter, e.g. fcs1, set "Unconfigure any Child Devices" to "yes", and "KEEP definition in database" to "no". Hit ENTER.
  • Replace the adapter: Run diag and choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Replace/Remove a PCI Hot Plug Adapter. Choose the correct device (be careful, you won't see the adapter name here, but only "Unknown", because the device was unconfigured).
  • Have the IBM CE replace the adapter.
  • Close any events on the failing adapter on the HMC.
  • Validate that the notification LED is now off on the system, if not, go back into diag, choose Task Selection, Hot Plug Task, PCI Hot Plug Manager, and Disable the attention LED.
  • Check the adapter firmware level using:
    # lscfg -vl fcs1
    (replace this with the actual adapter name).

    And if required, update the adapter firmware microcode. Validate if the adapter is still functioning correctly by running:
    # errpt
    # lsdev -Cc adapter
  • Have the SAN admin update the WWN.
  • Run:
    # cfgmgr -S
  • Check the adapter and the child devices:
    # lsdev -Cc adapter
    # lsdev -p fcs1
    # lsdev -p fscsi1
    (replace this with the correct adapter name).
  • Add the paths to the device:
    # addpaths
  • Check if the vpaths have all paths again:
    # datapath query device | more

Topics: EMC, SAN, Storage, System Admin

Recovering from dead EMC paths

If you run:

# powermt display dev=all
And you notice that there are "dead" paths, then these are the commands to run in order to set these paths back to "alive" again, of course, AFTER ensuring that any SAN related issues are resolved.

To have PowerPath scan all devices and mark any dead devices as alive, if it finds that a device is in fact capable of doing I/O commands, run:
# powermt restore
To delete any dead paths, and to reconfigure them again:
# powermt reset
# powermt config
Or you could run:
# powermt check

Topics: EMC, Installation, ODM, SAN, Storage

How to cleanup AIX EMC ODM definitions

From powerlink.emc.com:

  1. Before making any changes, collect host logs to document the current configuration. At a minimum, save the following: inq, lsdev -Cc disk, lsdev -Cc adapter, lspv, and lsvg
  2. Shutdown the application(s), unmount the file system(s), and varyoff all volume groups except for rootvg. Do not export the volume groups.
    # varyoffvg <vg_name>
    Check with lsvg -o (confirm that only rootvg is varied on)
    If no PowerPath, skip all steps with power names.
  3. For CLARiiON configuration, if Navisphere Agent is running, stop it:
    # /etc/rc.agent stop
  4. Remove paths from Powerpath configuration:
    # powermt remove hba=all
  5. Delete all hdiskpower devices:
    # lsdev -Cc disk -Fname | grep power | xargs -n1 rmdev -dl
  6. Remove the PowerPath driver instance:
    # rmdev -dl powerpath0
  7. Delete all hdisk devices:
    For Symmetrix devices, use this command:
    # lsdev -CtSYMM* -Fname | xargs -n1 rmdev -dl
    For CLARiiON devices, use this command:
    # lsdev -CtCLAR* -Fname | xargs -n1 rmdev -dl
  8. Confirm with lsdev -Cc disk that there are no EMC hdisks or hdiskpowers.
  9. Remove all Fiber driver instances:
    # rmdev -Rdl fscsiX
    (X being driver instance number, i.e. 0,1,2, etc.)
  10. Verify through lsdev -Cc driver that there are no more fiber driver instances (fscsi).
  11. Change the adapter instances in Defined state
    # rmdev -l fcsX
    (X being adapter instance number, i.e. 0,1,2, etc.)
  12. Create the hdisk entries for all EMC devices:
    # emc_cfgmgr
    or
    # cfgmgr -vl fcsx
    (x being each adapter instance which was rebuilt). Skip this part if no PowerPath.
  13. Configure all EMC devices into PowerPath:
    # powermt config
  14. Check the system to see if it now displays correctly:
    # powermt display
    # powermt display dev=all
    # lsdev -Cc disk
    # /etc/rc.agent start

Topics: EMC, SAN, Storage

Display the status of EMC SAN devices

An easy way to see the status of your SAN devices is by using the following command:

# powermt display
Symmetrix logical device count=6
CLARiiON logical device count=0
Hitachi logical device count=0
Invista logical device count=0
HP xp logical device count=0
Ess logical device count=0
HP HSx logical device count=0
==============================================================
- Host Bus Adapters -  --- I/O Paths ----  ------ Stats ------
### HW Path            Summary Total Dead  IO/Sec Q-IOs Errors
==============================================================
  0 fscsi0             optimal     6    0       -     0      0
  1 fscsi1             optimal     6    0       -     0      0
To get more information on the disks, use:
# powermt display dev=all

Topics: Storage, System Admin

Inodes without filenames

It will sometimes occur that a file system reports storage to be in use, while you're unable to find which file exactly is using that storage. This may occur when a process has used disk storage, and is still holding on to it, without the file actually being there anymore for whatever reason.

A good way to resolve such an issue, is to reboot the server. This way, you'll be sure the process is killed, and the disk storage space is released. However, if you don't want to use such drastic measures, here's a little script that may help you trying to find the process that may be responsible for an inode without a filename. Make sure you have lsof installed on your server.

#!/usr/bin/ksh

# Make sure to enter a file system to scan
# as the first attribute to this script.
FILESYSTEM=$1
LSOF=/usr/sbin/lsof

# A for loop to get a list of all open inodes
# in the filesystem using lsof.
for i in `$LSOF -Fi $FILESYSTEM | grep ^i | sed s/i//g` ; do
# Use find to list associated inode filenames.
if [ `find $FILESYSTEM -inum $i` ] ; then
echo > /dev/null
else
# If filename cannot be found,
# then it is a suspect and check lsof output for this inode.
echo Inode $i does not have an associated filename:
$LSOF $FILESYSTEM | grep -e $i -e COMMAND
fi
done

Topics: SAN, SDD, Storage

Vpath commands

Check the relation between vpaths and hdisks:

# lsvpcfg
Check the status of the adapters according to SDD:
# datapath query adapter
Check on stale partitions:
# lsvg -o | lsvg -i | grep -i stale

Topics: PowerHA / HACMP, SAN, SDD, Storage

Reservation bit

If you wish to get rid of the SCSI disk reservation bit on SCSI, SSA and VPATH devices, there are two ways of achieving this:

Firstly, HACMP comes along with some binaries that do this job:

# /usr/es/sbin/cluster/utilities/cl_SCSIdiskreset /dev/vpathx
Secondly, there is a little (not official) IBM binary tool called "lquerypr". This command is part of the SDD driver fileset. It can also release the persistant reservation bit and clear all reservations:

First check if you have any reservations on the vpath:
# lquerypr -vh /dev/vpathx
Clear it as follows:
# lquerypr -ch /dev/vpathx
In case this doesn't work, try the following sequence of commands:
# lquerypr -ch /dev/vpathx
# lquerypr -rh /dev/vpathx
# lquerypr -ph /dev/vpathx
If you'd like to see more information about lquerypr, simply run lquerypr without any options, and it will display extensive usage information.

For SDD, you should be able to use the following command to clear the persistant reservation:
# lquerypr -V -v -c /dev/vpathXX
For SDDPCM, use:
# pcmquerypr -V -v -c /dev/hdiskXX

Topics: Red Hat / Linux, SAN, Storage

Emulex hbanyware

If you have Emulex HBA''s and the hbanyware software installed, for example on Linux, then you can use the following commands to retrieve information about the HBA''s:

To run a GUI version:

# /usr/sbin/hbanyware/hbanyware
To run the command-line verion:
# /usr/sbin/hbanyware/hbacmd listhbas
To get for attributes about a specific HBA:
# /usr/sbin/hbanyware/hbacmd listhbas 10:00:00:00:c9:6c:9f:d0

Topics: SAN, Storage

SAN introduction

SAN storage places the physical disk outside a computer system. It is now connected to a Storage Area Network (SAN). In a Storage Area Network, storage is offered to many systems, including AIX systems. This is done via logical blocks of disk space (LUNs). In the case of an AIX system, every SAN disk is seen as a seperate hdisk, with the advantage of easily expanding the AIX system with new SAN disks, avoiding buying and installing new physical hard disks.

SAN concept using IBM pSeries


Other advantages of SAN:
  • Disk storage is no longer limited to the space in the computer system itself or the amount of available disk slots.
  • After the initial investment in the SAN network and storage, the costs of storage per gigabyte are less than disk space within the computer systems.
  • Using two different SAN networks (fabrics), you can avoid having disruptions in your storage, the same as mirroring your data on separate disks. The two SAN fabrics should not be connected to each other.
  • Using two seperate, geographically dispersed storage systems (e.g. ESS), a disruption in a computer center will not cause your computer systems to go down.
  • When you place to SAN network adapters (called Host Bay adapters on Fibre Channel or HBA) in every computer system, you can connect your AIX system to two different fabrics, thus increasing the availability of the storage. Also, you'll be able to load balance the disk storage over these two host bay adapters. You'll need Multipath I/O software (e.g. SDD or PowerPath) for this to work.
  • By using 2 HBAs, a defect in a single HBA will not cause downtime.
  • AIX systems are able to boot from SAN disks.

Topics: Red Hat / Linux, Storage, VMWare

Increasing the VMWare disk drive

If you have any VMWare images, where you made the disk size a little too small, then fortunately in VMWare Workstation you can change the size of a disk with a simple command line program. Sadly the command only makes your drive bigger not the actual partition. And especially Windows won't allow you to resize the partition where the Windows binaries are installed. So how can you get around that?

First, create a copy of your vmdk file to somewhere else, should the next action fail for some reason.

Then resize the disk to the required size:

# vmware-vdiskmanager -x 8GB myDisk.vmdk
You need to have plenty of disk space free to do this operation, as your vmdk file will be copied by vmware-vdiskmanager. BTW, this command may take a while, depending on the size of your vmdk file.

Now get the ISO image of System Rescue CD-ROM and set the VMWare session to boot of the ISO image. Then, run QTParted. You can do this by starting this CD-ROM with a framebuffer (press F2 at start) and then run run_qtparted as soon as Linux has started. Select the windows drive partition with the right mouse button and choose resize. Set the new size and commit the change. Then exit from QTParted and from Linux (init 0). Remove the ISO image from the VMWare session and restart VMWare to normally start Windows. Windows will detect the disk change and force a chk_disk to run. Once Windows has started, the new disk size is present.

Number of results found for topic Storage: 51.
Displaying results: 41 - 50.