UNIX Health Check

Tech Blog

These are blog entries written by the UNIX Health Check development team. Our team has extensive technical experience on both AIX and Red Hat systems, and we like to share our knowledge with our visitors.

Topics: AIX, System Admin

Finding files with no defined user or group

Use the following command to find any files that no longer have a valid user and/or group, which may happen when a group or user is deleted from a system:

# find / -fstype jfs $ -nouser -o -nogroup $ -ls

Topics: AIX, System Admin ↑

Cleaning file systems

It sometimes occurs that a file system runs full, while a process is active, e.g. when a process logs its output to a file. If you delete the log file of a process when it is still active, the file will be gone, but the disk space will usually not be freed. This is because the process keeps the inode of the file open as long as the process is active and still writes to the inode. After deleting the file, it's not available as file anymore, so you can't view the log file of the process anymore. The disk space will ONLY be freed once the process actually ends.

To overcome it, don't delete the log file, but copy /dev/null to it:

# cp /dev/null [logfile]

This will clear the file, free up the disk space and the process logging to the file will just continue logging as nothing ever happened.

Topics: AIX, System Admin ↑

Resizing the jfs log

In general IBM recommends that JFS log devices be set to 2MB for every 1GB of data to be protected. The default jfslog for rootvg is /dev/hd8 and is by default 1 PP large. In some cases, file system activity is too heavy or too frequent for the log device. When this occurs, the system will log errors like JFS_LOG_WAIT or JFS_LOG_WRAP.

First try to reduce the filesystem activity. If that's not possible, this is the way to extend the JFS log:

Determine which log device to increase. This can be determined by its Device Major/Minor Number in the error log:
# errpt -a
An example output follows:
```
Device Major/Minor Number
000A   0003
```
The preceding numbers are hexadecimal numbers and must be converted to decimal values. In this exmpale, hexadecial 000A 0003 equals to decimal numbers 10 and 3.
Determine which device corresponds with these Device Major/Minor Numbers:
# ls -al /dev | grep "10, 3"
If the output from the preceding command reveals that the log device the needs to be enlarged is /dev/hd8 (the default JFS log device for rootvg), then special actions are needed. See further on.
Increase the size of /dev/hd8:
extendlv hd8 1
If the jfslog device is /dev/hd8, then boot the machine into Service Mode and access the root volume group and start a shell. If the jfslog is a user created jfslog, then unmount all filesystems that use the jfslog in question (use mount to show the jfslog used for each filesystem).
Format the jfslog:
# logform [jsflog device]
For example:
logform /dev/hd8
If the jfslog device is /dev/hd8, then reboot the system:
# sync; sync; sync; reboot
If the jfslog is a user created jfslog, then mount all filesystems again after the logform completed.

Topics: AIX, System Admin ↑

Bootinfo

To find out if your machine has a 64 or 32 bit architecture:

# bootinfo -y

To find out which kernel the system is running:

# bootinfo -K

You can also check the link /unix:

# ls -ald /unix

unix_mp: 32 bits, unix_64: 64 bits

To find out from which disk your system last booted:

# bootinfo -b

To find out the size of real memory:

# bootinfo -r

To display the hardware platform type:

# bootinfo -T

Topics: AIX ↑

Web interface to upload diagnostic data to IBM

If you have a PMR open with IBM and you need to upload logs/snap/dump etcetera to IBM: You can use the below web interface to upload files. All you need is your PMR number, type (mostly AIX), and your email address. Once the files are uploaded, you get an email confirmation. You can still use the traditional ftp to testcase, should you prefer.

http://www.ecurep.ibm.com/app/upload

Topics: AIX ↑

Getting yesterdays date

Getting yesterdays date on AIX can be a litte tricky. One way of doing this, is by using gnu-date:

# date --date=yesterday +%m%d%y

But gnu-date has been known to sometimes have issues when shifting to or from daylight savings time. Another good solution to getting yesterdays date is:

# perl -MPOSIX -le 'print strftime "%m%d%y",localtime(time-(60*60*24))'

Topics: AIX, PowerHA / HACMP, System Admin ↑

Email messages from the cron daemon

Some user accounts, mostly service accounts, may create a lot of email messages, for example when a lot of commands are run by the cron daemon for a specific user. There are a couple of ways to deal with this:

1. Make sure no unnecesary emails are sent at all

To avoid receiving messages from the cron daemon; one should always redirect the output of commands in crontabs to a file or to /dev/null. Also make sure to redirect STDERR as well:

0 * * * * /path/to/command > /path/to/logfile 2>&1
1 * * * * /path/to/command > /dev/null 2>&1

2. Make sure the commands in the crontab actually exist

An entry in a crontab with a command that does not exits, will generate an email message from the cron daemon to the user, informing the user about this issue. This is something that may occur on HACMP clusters where crontab files are synchronized on all HACMP nodes. They need to be synchronize on all the nodes, just in case a resource group fails over to a standby node. However, the required file systems containing the commands may not be available on all the nodes at all time. To get around that, test if the command exists first:

0 * * * * [ -x /path/to/command ] && /path/to/command > /path/to/logfile 2>&1

3. Clean up the email messages regularly

The last way of dealing with this, is to add another cron entry to a users crontab; that cleans out the mailbox every night, for example the next command that deletes all but the last 1000 messages from a users mailbox:

0 * * * * echo d1-$(let num="$(echo f|mail|tail -1|awk '{print $2}')-1000";echo $num)|mail >/dev/null

4. Forward the email to the user

Very effective: Create a .forward file in the users home directory, to forward all email messages to the user. If the user starts receiving many, many emails, he/she will surely do somehting about it, when it gets annoying.

Topics: AIX, Networking, PowerHA / HACMP ↑

Specifying the default gateway on a specific interface

When you're using HACMP, you usually have multiple network adapters installed and thus multiple network interface to handle with. If AIX configured the default gateway on a wrong interface (like on your management interface instead of the boot interface), you might want to change this, so network traffic isn't sent over the management interface. Here's how you can do this:

First, stop HACMP or do a take-over of the resource groups to another node; this will avoid any problems with applications when you start fiddling with the network configuration.

Then open up a virtual terminal window to the host on your HMC. Otherwise you would loose the connection, as soon as you drop the current default gateway.

Now you need to determine where your current default gateway is configured. You can do this by typing:

# lsattr -El inet0
# netstat -nr

The lsattr command will show you the current default gateway route and the netstat command will show you the interface it is configured on. You can also check the ODM:

# odmget -q"attribute=route" CuAt

Now, delete the default gateway like this:

# lsattr -El inet0 | awk '$2 ~ /hopcount/ { print $2 }' | read GW
# chdev -l inet0 -a delroute=${GW}

If you would now use the route command to specifiy the default gateway on a specific interface, like this:

# route add 0 [ip address of default gateway: xxx.xxx.xxx.254] -if enX

You will have a working entry for the default gateway. But... the route command does not change anything in the ODM. As soon as your system reboots; the default gateway is gone again. Not a good idea.

A better solution is to use the chdev command:

# chdev -l inet0 -a addroute=net,-hopcount,0,,0,[ip address of default gateway]

This will set the default gateway to the first interface available.

To specify the interface use:

# chdev -l inet0 -a addroute=net,-hopcount,0,if,enX,,0,[ip address of default gateway]

Substitute the correct interface for enX in the command above.

If you previously used the route add command, and after that you use chdev to enter the default gateway, then this will fail. You have to delete it first by using route delete 0, and then give the chdev command.

Afterwards, check fi the new default gateway is properly configured:

# lsattr -El inet0
# odmget -q"attribute=route" CuAt

And ofcourse, try to ping the IP address of the default gateway and some outside address. Now reboot your system and check if the default gateway remains configured on the correct interface. And startup HACMP again!

Topics: AIX, Monitoring, System Admin ↑

"Bootpd: Received short packet" messages on console

If you're receiving messages like these on your console:

Mar 9 11:47:29 daemon:notice bootpd[192990]: received short packet
Mar 9 11:47:31 daemon:notice bootpd[192990]: received short packet
Mar 9 11:47:38 daemon:notice bootpd[192990]: hardware address not found: E41F132E3D6C

Then it means that you have the bootpd enabled on your server. There's nothing wrong with that. In fact, a NIM server for example requires you to have this enabled. However; these messages on the console can be annoying. There are systems on your network that are sending bootp requests (broadcast). Your system is listening to these requests and trying to answer. It is looking in the bootptab configuration (file /etc/bootptab) to see if their mac-addresses are defined. When they aren't, you are getting these messages.

To solve this, either disable the bootpd daemon, or change the syslog configuration. If you don't need the bootpd daemon, then edit the /etc/inetd.conf file and comment the entry for bootps. Then run:

# refresh -s inetd

If you do have a requirement for bootpd, then update the /etc/syslog.conf file and look for the entry that starts with daemon.notice:

#daemon.notice /dev/console
daemon.notice /nsr/logs/messages

By commenting the daemon.notice entry to /dev/console, and instead adding an entry that logs to a file, you can avoid seeing these messages on the console. Now all you have to do is refresh the syslogd daemon:

# refresh -s syslogd

Topics: AIX, Backup & restore, Monitoring, Red Hat / Linux, Spectrum Protect ↑

Report the end result of a TSM backup

A very easy way of getting a report from a backup is by using the POSTSchedulecmd entry in the dsm.sys file. Add the following entry to your dsm.sys file (which is usually located in /usr/tivoli/tsm/client/ba/bin or /opt/tivoli/tsm/client/ba/bin):

POSTSchedulecmd "/usr/local/bin/RunTsmReport"

This entry tells the TSM client to run script /usr/local/bin/RunTSMReport, as soon as it has completed its scheduled command. Now all you need is a script that creates a report from the dsmsched.log file, the file that is written to by the TSM scheduler:

#!/bin/bash
TSMLOG=/tmp/dsmsched.log
WRKDIR=/tmp
echo "TSM Report from `hostname`" >> ${WRKDIR}/tsmc
tail -100 ${TSMLOG} > ${WRKDIR}/tsma
grep -n "Elapsed processing time:" ${WRKDIR}/tsma > ${WRKDIR}/tsmb
CT2=`cat ${WRKDIR}/tsmb | awk -F":" '{print $1}'`
((CT3 = $CT2 - 14))
((CT5 = $CT2 + 1 ))
CT4=1
while read Line1 ; do
   if [ ${CT3} -gt ${CT4} ] ; then
      ((CT4 = ${CT4} + 1 ))
   else
      echo "${Line1}" >> ${WRKDIR}/tsmc
      ((CT4 = ${CT4} + 1 ))
      if [ ${CT4} -gt ${CT5} ] ; then
         break
      fi
   fi
done < ${WRKDIR}/tsma
mail -s "`hostname` Backup" email@address.com < ${WRKDIR}/tsmc
rm ${WRKDIR}/tsma ${WRKDIR}/tsmb ${WRKDIR}/tsmc

Number of results found for topic AIX: 231.
Displaying results: 211 - 220.

Order

No time to lose? Need to know what's wrong with
your UNIX system now? Then get started TODAY!