UNIX Health Check

Tech Blog

These are blog entries written by the UNIX Health Check development team. Our team has extensive technical experience on both AIX and Red Hat systems, and we like to share our knowledge with our visitors.

Topics: AIX, System Admin

Using colors in Korn Shell

Here are some color codes you can use in the Korn Shell:

## Reset to normal: \033[0m
NORM="\033[0m"

## Colors:
BLACK="\033[0;30m"
GRAY="\033[1;30m"
RED="\033[0;31m"
LRED="\033[1;31m"
GREEN="\033[0;32m"
LGREEN="\033[1;32m"
YELLOW="\033[0;33m"
LYELLOW="\033[1;33m"
BLUE="\033[0;34m"
LBLUE="\033[1;34m"
PURPLE="\033[0;35m"
PINK="\033[1;35m"
CYAN="\033[0;36m"
LCYAN="\033[1;36m"
LGRAY="\033[0;37m"
WHITE="\033[1;37m"

## Backgrounds
BLACKB="\033[0;40m"
REDB="\033[0;41m"
GREENB="\033[0;42m"
YELLOWB="\033[0;43m"
BLUEB="\033[0;44m"
PURPLEB="\033[0;45m"
CYANB="\033[0;46m"
GREYB="\033[0;47m"

## Attributes:
UNDERLINE="\033[4m"
BOLD="\033[1m"
INVERT="\033[7m"

## Cursor movements
CUR_UP="\033[1A"
CUR_DN="\033[1B"
CUR_LEFT="\033[1D"
CUR_RIGHT="\033[1C"

## Start of display (top left)
SOD="\033[1;1f"

Just copy everyting above and paste it into your shell or in a script. Then, you can use the defined variables:

## Example - Red underlined
echo "${RED}${UNDERLINE}This is a test!${NORM}"

## Example - different colors
echo "${RED}This ${YELLOW}is ${LBLUE}a ${INVERT}test!${NORM}"

## Example - cursor movement
# echo " ${CUR_LEFT}Test"

## Create a rotating thingy
while true ; do
printf "${CUR_LEFT}/"
perl -e "use Time::HiRes qw(usleep); usleep(100000)"
printf "${CUR_LEFT}-"
perl -e "use Time::HiRes qw(usleep); usleep(100000)"
printf "${CUR_LEFT}\\"
perl -e "use Time::HiRes qw(usleep); usleep(100000)"
printf "${CUR_LEFT}|"
perl -e "use Time::HiRes qw(usleep); usleep(100000)"
done

Note that the perl command used above will cause a sleep of 0.1 seconds. Perl is used here, because the sleep command can't be used to sleep less than 1 second.

Topics: AIX, System Admin ↑

FIRMWARE_EVENT

If FIRMWARE_EVENT entries appear in the AIX error log without FRU or location code callout, these events are likely attributed to an AIX memory page deconfiguration event, which is the result of a single memory cell being marked as unusable by the system firmware. The actual error is and will continue to be handled by ECC; however, notification of the unusable bit is also passed up to AIX. AIX in turn migrates the data and deallocates the memory page associated with this event from its memory map. This process is an AIX RAS feature which became available in AIX 5.3 and provides extra memory resilience and is no cause for alarm. Since the failure represents a single bit, a hardware action is NOT warranted.

To suppress logging, the following command will have to be entered and the partition will have to be rebooted to make the change effective:

# chdev -l sys0 -a log_pg_dealloc=false

Check the current status:

# lsattr -El sys0 -a log_pg_dealloc

More information about this function can be found in the "Highly Available POWER Servers for Business-Critical Applications" document which is available at the following link:

ftp://ftp.software.ibm.com/common/ssi/rep_wh/n/POW03003USEN/POW03003USEN.PDF (see pages 17-22 specifically).

Topics: AIX, Installation, System Admin ↑

Compare_report

The compare_report command is a very useful utility to compare the software installed on two systems, for example for making sure the same software is installed on two nodes of a PowerHA cluster.

First, create the necessary reports:

# ssh node2 "lslpp -Lc" > /tmp/node2
# lslpp -Lc > /tmp/node1

Next, generate the report. There are four interesting options: -l, -h, -m and -n:

-l Generates a report of base system installed software that is at a lower level.
-h Generates a report of base system installed software that is at a higher level.
-m Generates a report of filesets not installed on the other system.
-n Generates a report of filesets not installed on the base system.

For example:

# compare_report -b /tmp/node1 -o /tmp/node2 -l
#(baselower.rpt)
#Base System Installed Software that is at a lower level
#Fileset_Name:Base_Level:Other_Level
bos.msg.en_US.net.ipsec:6.1.3.0:6.1.4.0
bos.msg.en_US.net.tcp.client:6.1.1.1:6.1.4.0
bos.msg.en_US.rte:6.1.3.0:6.1.4.0
bos.msg.en_US.txt.tfs:6.1.1.0:6.1.4.0
xlsmp.msg.en_US.rte:1.8.0.1:1.8.0.3

# compare_report -b /tmp/node1 -o /tmp/node2 -h
#(basehigher.rpt)
#Base System Installed Software that is at a higher level
#Fileset_Name:Base_Level:Other_Level
idsldap.clt64bit62.rte:6.2.0.5:6.2.0.4
idsldap.clt_max_crypto64bit62.rte:6.2.0.5:6.2.0.4
idsldap.cltbase62.adt:6.2.0.5:6.2.0.4
idsldap.cltbase62.rte:6.2.0.5:6.2.0.4
idsldap.cltjava62.rte:6.2.0.5:6.2.0.4
idsldap.msg62.en_US:6.2.0.5:6.2.0.4
idsldap.srv64bit62.rte:6.2.0.5:6.2.0.4
idsldap.srv_max_cryptobase64bit62.rte:6.2.0.5:6.2.0.4
idsldap.srvbase64bit62.rte:6.2.0.5:6.2.0.4
idsldap.srvproxy64bit62.rte:6.2.0.5:6.2.0.4
idsldap.webadmin62.rte:6.2.0.5:6.2.0.4
idsldap.webadmin_max_crypto62.rte:6.2.0.5:6.2.0.4
AIX-rpm:6.1.3.0-6:6.1.3.0-4

# compare_report -b /tmp/node1 -o /tmp/node2 -m
#(baseonly.rpt)
#Filesets not installed on the Other System
#Fileset_Name:Base_Level
Java6.sdk:6.0.0.75
Java6.source:6.0.0.75
Java6_64.samples.demo:6.0.0.75
Java6_64.samples.jnlp:6.0.0.75
Java6_64.source:6.0.0.75
WSBAA70:7.0.0.0
WSIHS70:7.0.0.0

# compare_report -b /tmp/node1 -o /tmp/node2 -n
#(otheronly.rpt)
#Filesets not installed on the Base System
#Fileset_Name:Other_Level
xlC.sup.aix50.rte:9.0.0.1

Topics: AIX, Networking, System Admin ↑

Using iptrace

The iptrace command can be very useful to find out what network traffic flows to and from an AIX system.

You can use any combination of these options, but you do not need to use them all:

-a Do NOT print out ARP packets.
-s [source IP] Limit trace to source/client IP address, if known.
-d [destination IP] Limit trace to destination IP, if known.
-b Capture bidirectional network traffic (send and receive packets).
-p [port] Specify the port to be traced.
-i [interface] Only trace for network traffic on a specific interface.

Example:

Run iptrace on AIX interface en1 to capture port 80 traffic to file trace.out from a single client IP to a server IP:

# iptrace -a -i en1 -s clientip -b -d serverip -p 80 trace.out

This trace will capture both directions of the port 80 traffic on interface en1 between the clientip and serverip and sends this to the raw file of trace.out.

To stop the trace:

# ps -ef|grep iptrace
# kill

The ipreport command can be used to transform the trace file generated by iptrace to human readable format:

# ipreport trace.out > trace.report

Topics: AIX, Installation, System Admin ↑

How to update the AIX-rpm virtual package

AIX-rpm is a "virtual" package which reflects what has been installed on the system by installp. It is created by the /usr/sbin/updtvpkg script when the rpm.rte is installed, and can be run anytime the administrator chooses (usually after installing something with installp that is required to satisfy some dependency by an RPM package).

Since AIX-rpm has to have some sort of version number, it simply reflects the level of bos.rte on the system where /usr/sbin/updtvpkg is being run. It's just informational - nothing should be checking the level of AIX-rpm.

AIX doesn't just automatically run /usr/sbin/updtvpkg every time that something gets installed or deinstalled because on some slower systems with lots of software installed, /usr/sbin/updtvpkg can take a LONG time.

If you want to run the command manually:

# /usr/sbin/updtvpkg

If you get an error similar to "cannot read header at 20760 for lookup" when running updtvpkg, run a rpm rebuilddb:

# rpm --rebuilddb

Once you run updtvpkg, you can run a rpm -qa to see your new AIX-rpm package.

Topics: AIX, System Admin ↑

PRNG is not SEEDED

If you get a message "PRNG is not SEEDED" when trying to run ssh, you probably have an issue with the /dev/random and/or /dev/urandom devices on your system. These devices are created during system installation, but may sometimes be missing after an AIX upgrade.

Check permissions on random numbers generators, the "others" must have "read" access to these devices:

# ls -l /dev/random /dev/urandom
crw-r--r-- 1 root system 39, 0 Jan 22 10:48 /dev/random
crw-r--r-- 1 root system 39, 1 Jan 22 10:48 /dev/urandom

If the permissions are not set correctly, change them as follows:

# chmod o+r /dev/random /dev/urandom

Now stop and start the SSH daemon again, and retry if ssh works.

# stopsrc -s sshd
# startsrc -s sshd

If this still doesn't allow users to use ssh and the same message is produced, or if devices /dev/random and/or /dev/urandom are missing:

# stopsrc -s sshd
# rm -rf /dev/random
# rm -rf /dev/urandom
# mknod /dev/random c 39 0
# mknod /dev/urandom c 39 1
# randomctl -l
# ls -ald /dev/random /dev/urandom
# startsrc -s sshd

Topics: AIX, Backup & restore, LVM, Performance, Storage, System Admin ↑

Using lvmstat

One of the best tools to look at LVM usage is with lvmstat. It can report the bytes read and written to logical volumes. Using that information, you can determine which logical volumes are used the most.

Gathering LVM statistics is not enabled by default:

# lvmstat -v data2vg
0516-1309 lvmstat: Statistics collection is not enabled for
        this logical device. Use -e option to enable.

As you can see by the output here, it is not enabled, so you need to actually enable it for each volume group prior to running the tool using:

# lvmstat -v data2vg -e

The following command takes a snapshot of LVM information every second for 10 intervals:

# lvmstat -v data2vg 1 10

This view shows the most utilized logical volumes on your system since you started the data collection. This is very helpful when drilling down to the logical volume layer when tuning your systems.

# lvmstat -v data2vg

Logical Volume    iocnt   Kb_read  Kb_wrtn   Kbps
  appdatalv      306653  47493022   383822  103.2
  loglv00            34         0     3340    2.8
  data2lv           453    234543   234343   89.3

What are you looking at here?

iocnt: Reports back the number of read and write requests.
Kb_read: Reports back the total data (kilobytes) from your measured interval that is read.
Kb_wrtn: Reports back the amount of data (kilobytes) from your measured interval that is written.
Kbps: Reports back the amount of data transferred in kilobytes per second.

You can use the -d option for lvmstat to disable the collection of LVM statistics.

Topics: AIX, Backup & restore, LVM, Performance, Storage, System Admin ↑

Spreading logical volumes over multiple disks

A common issue on AIX servers is, that logical volumes are configured on only one single disk, sometimes causing high disk utilization on a small number of disks in the system, and impacting the performance of the application running on the server.

If you suspect that this might be the case, first try to determine which disks are saturated on the server. Any disk that is in use more than 60% all the time, should be considered. You can use commands such as iostat, sar -d, nmon and topas to determine which disks show high utilization. If the do, check which logical volumes are defined on that disk, for example on an IBM SAN disk:

# lspv -l vpath23

A good idea always is to spread the logical volumes on a disk over multiple disk. That way, the logical volume manager will spread the disk I/O over all the disks that are part of the logical volume, utilizing the queue_depth of all disks, greatly improving performance where disk I/O is concerned.

Let's say you have a logical volume called prodlv of 128 LPs, which is sitting on one disk, vpath408. To see the allocation of the LPs of logical volume prodlv, run:

# lslv -m prodlv

Let's also assume that you have a large number of disks in the volume group, in which prodlv is configured. Disk I/O usually works best if you have a large number of disks in a volume group. For example, if you need to have 500 GB in a volume group, it is usually a far better idea to assign 10 disks of 50 GB to the volume group, instead of only one disk of 512 GB. That gives you the possibility of spreading the I/O over 10 disks instead of only one.

To spread the disk I/O prodlv over 8 disks instead of just one disk, you can create an extra logical volume copy on these 8 disks, and then later on, when the logical volume is synchronized, remove the original logical volume copy (the one on a single disk vpath408). So, divide 128 LPs by 8, which gives you 16LPs. You can assign 16 LPs for logical volume prodlv on 8 disks, giving it a total of 128 LPs.

First, check if the upper bound of the logical volume is set ot at least 9. Check this by running:

# lslv prodlv

The upper bound limit determines on how much disks a logical volume can be created. You'll need the 1 disk, vpath408, on which the logical volume already is located, plus the 8 other disks, that you're creating a new copy on. Never ever create a copy on the same disk. If that single disk fails, both copies of your logical volume will fail as well. It is usually a good idea to set the upper bound of the logical volume a lot higher, for example to 32:

# chlv -u 32 prodlv

The next thing you need to determine is, that you actually have 8 disks with at least 16 free LPs in the volume group. You can do this by running:

# lsvg -p prodvg | sort -nk4 | grep -v vpath408 | tail -8
vpath188  active  959   40  00..00..00..00..40
vpath163  active  959   42  00..00..00..00..42
vpath208  active  959   96  00..00..96..00..00
vpath205  active  959  192  102..00..00..90..00
vpath194  active  959  240  00..00..00..48..192
vpath24   active  959  243  00..00..00..51..192
vpath304  active  959  340  00..89..152..99..00
vpath161  active  959  413  14..00..82..125..192

Note how in the command above the original disk, vpath408, was excluded from the list.

Any of the disks listed, using the command above, should have at least 1/8th of the size of the logical volume free, before you can make a logical volume copy on it for prodlv.

Now create the logical volume copy. The magical option you need to use is "-e x" for the logical volume commands. That will spread the logical volume over all available disks. If you want to make sure that the logical volume is spread over only 8 available disks, and not all the available disks in a volume group, make sure you specify the 8 available disks:

# mklvcopy -e x prodlv 2 vpath188 vpath163 vpath208 \
vpath205 vpath194 vpath24 vpath304 vpath161

Now check again with "mklv -m prodlv" if the new copy is correctly created:

# lslv -m prodlv | awk '{print $5}' | grep vpath | sort -dfu | \
while read pv ; do
result=`lspv -l $pv | grep prodlv`
echo "$pv $result"
done

The output should similar like this:

vpath161 prodlv  16  16  00..00..16..00..00  N/A
vpath163 prodlv  16  16  00..00..00..00..16  N/A
vpath188 prodlv  16  16  00..00..00..00..16  N/A
vpath194 prodlv  16  16  00..00..00..16..00  N/A
vpath205 prodlv  16  16  16..00..00..00..00  N/A
vpath208 prodlv  16  16  00..00..16..00..00  N/A
vpath24  prodlv  16  16  00..00..00..16..00  N/A
vpath304 prodlv  16  16  00..16..00..00..00  N/A

Now synchronize the logical volume:

# syncvg -l prodlv

And remove the original logical volume copy:

# rmlvcopy prodlv 1 vpath408

Then check again:

# lslv -m prodlv

Now, what if you have to extend the logical volume prodlv later on with another 128 LPs, and you still want to maintain the spreading of the LPs over the 8 disks? Again, you can use the "-e x" option when running the logical volume commands:

# extendlv -e x prodlv 128 vpath188 vpath163 vpath208 \
vpath205 vpath194 vpath24 vpath304 vpath161

You can also use the "-e x" option with the mklv command to create a new logical volume from the start with the correct spreading over disks.

Topics: AIX, Performance, System Admin ↑

Creating a CSV file from NMON data

Shown below a script that can be used to create a simple comma separated values file (CSV) from NMON data.

If you wish to create a CSV file of the CPU usage on your system, you can grep for "CPU_ALL," in the nmon file. If you want to create a CSV file of the memory usage, grep for "MEM," in the nmon file. The script below creates a CSV file for the CPU usage.

#!/bin/ksh

node=`hostname`
rm -f /tmp/cpu_all.tmp /tmp/zzzz.tmp /tmp/${node}_nmon_cpu.csv
for nmon_file in `ls /var/msgs/nmon/*nmon`
do
  datestamp=`echo ${nmon_file} | cut -f2 -d"_"`
  grep CPU_ALL, $nmon_file > /tmp/cpu_all.tmp
  grep ZZZZ $nmon_file > /tmp/zzzz.tmp
  grep -v "CPU Total " /tmp/cpu_all.tmp | sed "s/,/ /g" | \
  while read NAME TS USER SYS WAIT IDLE rest
  do
    timestamp=`grep ${TS} /tmp/zzzz.tmp | awk 'FS=","{print $4" "$3}'`
    TOTAL=`echo "scale=1;${USER}+${SYS}" | bc`
    echo $timestamp,$USER,$SYS,$WAIT,$IDLE,$TOTAL >> \
    /tmp/${node}_nmon_cpu.csv
  done
  rm -f /tmp/cpu_all.tmp /tmp/zzzz.tmp
done

Note: the script assumes that you've stored the NMON output files in /var/msgs/nmon. Update the script to the folder you're using to store NMON files.

Topics: AIX, System Admin ↑

Difference between major and minor numbers

A major number refers to a type of device, and a minor number specifies a particular device of that type or sometimes the operation mode of that device type.

Example:

# lsdev -Cc tape
rmt0 Available 3F-08-02 IBM 3580 Ultrium Tape Drive (FCP)
rmt1 Available 3F-08-02 IBM 3592 Tape Drive (FCP)
smc0 Available 3F-08-02 IBM 3576 Library Medium Changer (FCP)

In the list above:

rmt1 is a standalone IBM 3592 tape drive;
rmt0 is an LTO4 drive of a library;
smc0 is the medium changer (or robotic part) of above tape library.

Now look at their major and minor numbers:

# ls -l /dev/rmt* /dev/smc*
crw-rw-rwT 1 root system 38, 0 Nov 13 17:40 /dev/rmt0
crw-rw-rwT 1 root system 38,128 Nov 13 17:40 /dev/rmt1
crw-rw-rwT 1 root system 38, 1 Nov 13 17:40 /dev/rmt0.1
crw-rw-rwT 1 root system 38, 66 Nov 13 17:40 /dev/smc0

All use IBM tape device driver (and so have the same major number of 38), but actually they are different entities (with minor number of 0, 128 and 66 respectively). Also, compare rmt0 and rmt0.1. It's the same device, but with different mode of operation.

Number of results found for topic AIX: 231.
Displaying results: 71 - 80.

Order

No time to lose? Need to know what's wrong with
your UNIX system now? Then get started TODAY!