UNIX Health Check

Tech Blog

These are blog entries written by the UNIX Health Check development team. Our team has extensive technical experience on both AIX and Red Hat systems, and we like to share our knowledge with our visitors.

Topics: AIX, System Admin

Number of active virtual processors

To know quickly how many virtual processors are active, run:

# echo vpm | kdb

For example:

# echo vpm | kdb
...
VSD Thread State.
 CPU VP_STATE   SLEEP_STATE  PROD_TIME: SECS   NSECS     CEDE_LAT

   0  ACTIVE    AWAKE        0000000000000000  00000000  00
   1  ACTIVE    AWAKE        0000000000000000  00000000  00
   2  ACTIVE    AWAKE        0000000000000000  00000000  00
   3  ACTIVE    AWAKE        0000000000000000  00000000  00
   4  DISABLED  AWAKE        00000000503536C7  261137E1  00
   5  DISABLED  SLEEPING     0000000051609EAF  036D61DC  02
   6  DISABLED  SLEEPING     0000000051609E64  036D6299  02
   7  DISABLED  SLEEPING     0000000051609E73  036D6224  02

Topics: AIX, System Admin ↑

How to read the /var/adm/ras/diag log file

There are 2 ways for reading the Diagnostics log file, located in /var/adm/ras/diag:

The first option uses the diag tool. Run:

# diag

Then hit ENTER and select "Task Selection", followed by "Display Previous Diagnostic Results" and "Display Previous Results".

The second option is to use diagrpt. Run:

# /usr/lpp/diagnostics/bin/diagrpt -s 010101

To display only the last entry, run:

# /usr/lpp/diagnostics/bin/diagrpt -o

Topics: AIX, Backup & restore, System Admin, Virtual I/O Server, Virtualization ↑

How to make a system backup of a VIOS

To create a system backup of a Virtual I/O Server (VIOS), run the following commands (as user root):

# /usr/ios/cli/ioscli viosbr -backup -file vios_config_bkup
-frequency daily -numfiles 10
# /usr/ios/cli/ioscli backupios -nomedialib -file /mksysb/$(hostname).mksysb -mksysb

The first command (viosbr) will create a backup of the configuration information to /home/padmin/cfgbackups. It will also schedule the command to run every day, and keep up to 10 files in /home/padmin/cfgbackups.

The second command is the mksysb equivalent for a Virtual I/O Server: backupios. This command will create the mksysb image in the /mksysb folder, and exclude any ISO repositiory in rootvg, and anything else excluded in /etc/exclude.rootvg.

Topics: AIX, Backup & restore, Storage, System Admin ↑

Using mkvgdata and restvg in DR situations

It is useful to run the following commands before you create your (at least) weekly mksysb image:

# lsvg -o | xargs -i mkvgdata {}
# tar -cvf /sysadm/vgdata.tar /tmp/vgdata

Add these commands to your mksysb script, just before running the mksysb command. What this does is to run the mkvgdata command for each online volume group. This will generate output for a volume group in /tmp/vgdata. The resulting output is then tar'd and stored in the /sysadm folder or file system. This allows information regarding your volume groups, logical volumes, and file systems to be included in your mksysb image.

To recreate the volume groups, logical volumes and file systems:

Run:
# tar -xvf /sysadm/vgdata.tar
Now edit /tmp/vgdata/{volume group name}/{volume group name}.data file and look for the line with "VG_SOURCE_DISK_LIST=". Change the line to have the hdisks, vpaths or hdiskpowers as needed.
Run:
# restvg -r -d /tmp/vgdata/{volume group name}/{volume group name}.data

Make sure to remove file systems with the rmfs command before running restvg, or it will not run correctly. Or, you can just run it once, run the exportvg command for the same volume group, and run the restvg command again. There is also a "-s" flag for restvg that lets you shrink the file system to its minimum size needed, but depending on when the vgdata was created, you could run out of space, when restoring the contents of the file system. Just something to keep in mind.

Topics: AIX, System Admin ↑

Select the n'th line of a file

What if you want to get the 7th line of a text file. For example, you could get the 7th line of the /etc/hosts file, by using the head and tail commands, like this:

# head -7 /etc/hosts | tail -1
# Licensed Materials - Property of IBM

An even easier way to do it, is:

# sed -n 7p /etc/hosts
# Licensed Materials - Property of IBM

Topics: AIX, System Admin ↑

A quick way to remove all printer queues

Here's a quick way to remove all the printer queues from an AIX system:

/usr/lib/lpd/pio/etc/piolsvp -p | grep -v PRINTER | \
   while read queue device rest ; do
   echo $queue $device
   rmquedev -q$queue -d$device
   rmque -q$queue
done

Topics: AIX, System Admin, Virtualization ↑

Using the Command-Line Interface for LPM

Once you've successfully set up live partition mobility on a couple of servers, you may want to script the live partition mobility migrations, and at that time, you'll need the commands to perform this task on the HMC.

In the example below, we're assuming you have multiple managed systems, managed through one HMC. Without, it would be difficult to move an LPAR from one managed system to another.

First of all, to see the actual state of the LPAR that is to be migrated, you may want to start the nworms program, which is a small program that displays wriggling worms along with the serial number on your display. This allows you to see the serial number of the managed system that the LPAR is running on. Also, the worms will change color, as soon as the LPM migration has been completed.

For example, to start nworms with 5 worms and an acceptable speed on a Power7 system, run:

# ./nworms 5 50000

Next, log on through ssh to your HMC, and see what managed systems are out there:

> lssyscfg -r sys -F name
Server1-8233-E8B-SN066001R
Server2-8233-E8B-SN066002R
Server3-8233-E8B-SN066003R

It seems there are 3 managed systems in the example above.

Now list the status of the LPARs on the source system, assuming you want to migrate from Server1-8233-E8B-SN066001R, moving an LPAR to Server2-8233-E8B-SN066002R:

> lslparmigr -r lpar -m Server1-8233-E8B-SN066001R
name=vios1,lpar_id=3,migration_state=Not Migrating
name=vios2,lpar_id=2,migration_state=Not Migrating
name=lpar1,lpar_id=1,migration_state=Not Migrating

The example above shows there are 2 VIO servers and 1 LPAR on server Server1-8233-E8B-SN066001R.

Validate if it is possible to move lpar1 to Server2-82330E8B-SN066002R:

> migrlpar -o v -t Server2-8233-E8B-SN066002R -m 
Server1-8233-E8B-SN066001R --id 1
> echo $?
0

The example above shows a validation (-o v) to the target server (-t) from the source server (-m) for the LPAR with ID 1, which we know from the lslparmigr command is our LPAR lpar1. If the command returns a zero, the validation has completed successfully.

Now perform the actual migration:

> migrlpar -o m -t Server2-8233-E8B-SN066002R 
-m Server1-8233-E8B-SN066001R -p lpar1 &

This will take a couple a minutes, and the migration is likely to take longer, depending on the size of memory of the LPAR.

To check the state:

> lssyscfg -r lpar -m Server1-8233-E8B-SN066001R -F name,state

Or to see the number of bytes transmitted and remaining to be transmitted, run:

> lslparmigr -r lpar -m Server1-8233-E8B-SN066001R -F name,migration_state,bytes_transmitted,bytes_remaining

Or to see the reference codes (which you can also see on the HMC gui):

> lsrefcode -r lpar -m Server2-8233-E8B-SN066002R
lpar_name=lpar1,lpar_id=1,time_stamp=06/26/2012 15:21:24,
   refcode=C20025FF,word2=00000000
lpar_name=vios1,lpar_id=2,time_stamp=06/26/2012 15:21:47,
   refcode=,word2=03400000,fru_call_out_loc_codes=
lpar_name=vios2,lpar_id=3,time_stamp=06/26/2012 15:21:33,
   refcode=,word2=03D00000,fru_call_out_loc_codes=

After a few minutes the lslparmigr command will indicate that the migration has been completed. And now that you know the commands, it's fairly easy to script the migration of multiple LPARs.

Topics: AIX, Storage, System Admin, Virtualization ↑

Change default value of hcheck_interval

The default value of hcheck_interval for VSCSI hdisks is set to 0, meaning that health checking is disabled. The hcheck_interval attribute of an hdisk can only be changed online if the volume group to which the hdisk belongs, is not active. If the volume group is active, the ODM value of the hcheck_interval can be altered in the CuAt class, as shown in the following example for hdisk0:

# chdev -l hdisk0 -a hcheck_interval=60 -P

The change will then be applied once the system is rebooted. However, it is possible to change the default value of the hcheck_interval attribute in the PdAt ODM class. As a result, you won't have to worry about its value anymore and newly discovered hdisks will automatically get the new default value, as illustrated in the example below:

# odmget -q 'attribute = hcheck_interval AND uniquetype = \
PCM/friend/vscsi' PdAt | sed 's/deflt = \"0\"/deflt = \"60\"/' \
| odmchange -o PdAt -q 'attribute = hcheck_interval AND \
uniquetype = PCM/friend/vscsi'

Topics: AIX, System Admin ↑

Resolving LED code 555

If your system hangs with LED code 555, it will most likely mean that one of your rootvg file systems is corrupt. The following link will provide information on how to resolve it:

http://www-304.ibm.com/support/docview.wss?uid=isg3T1000217

After completing the procedure, the system may still hang with LED code 555. If that happens, boot the system from media and enter service mode again, and access the volume group. Then check what the boot disk is according to:

# lslv -m hd5

Then also check your bootlist:

# bootlist -m normal -o

If these 2 don't match, set the boot list to the correct disk, as indicated by the lslv command above. For example, to set it to hdisk1, run:

# bootlist -m normal hdisk1

And then, make sure you can run the bosboot commands:

# bosboot -ad /dev/hdisk1
# bosboot -ad /dev/ipldevice

Note: exchange hdisk1 in the example above with the disk that was indicated by the lslv command.

If the bosboot on the ipldevice fails, you have 2 options: Recover the system from a mksysb image, or recreate hd5. First, create a copy of your ODM:

# mount /dev/hd4 /mnt
# mount /dev/hd2 /mnt/usr
# mkdir /mnt/etc/objrepos/bak
# cp /mnt/etc/objrepos/Cu* /mnt/etc/objrepos/bak
# cp /etc/objrepos/Cu* /mnt/etc/objrepos
# umount /dev/hd2
# umount /dev/hd4
# exit

Then, recreate hd5, for example, for hdisk1:

# rmlv hd5
# cd /dev
# rm ipldevice
# rm ipl_blv 
# mklv -y hd5 -t boot -ae rootvg 1 hdisk1
# ln /dev/rhd5 /dev/ipl_blv
# ln /dev/rhdisk1 /dev/ipldevice
# bosboot -ad /dev/hdisk1

If things still won't boot at this time, the only option you have left is to recover the system from a mksysb image.

Topics: AIX, System Admin ↑

Parent process ID

It's very easy to determine the parent process ID, without looking it up in the process list. For example for the current korn shell process, you can determine the parent process of the korn shell process, by looking at the process list:

 # ps -ef | grep ksh | grep -v grep
    root  8061040 17891578   0 22:28:32  pts/0  0:00 -ksh

In the example above you can see that the parent process of the korn shell process with PID 8061040 is 17891578. The same answer can be retrieved by simply looking at the PPID variable:

# echo $PPID
17891578

Number of results found: 470.
Displaying results: 131 - 140.

Order

No time to lose? Need to know what's wrong with
your UNIX system now? Then get started TODAY!