UNIX Health Check

Tech Blog

These are blog entries written by the UNIX Health Check development team. Our team has extensive technical experience on both AIX and Red Hat systems, and we like to share our knowledge with our visitors.

Topics: HP Output Server

Stack timeout

Problems may be encountered with printers connected via Axis Boxes to the network, producing errors with some of the prints, usually larger prints. The printer prints a page with the following text: "ERROR: timeout, OFFENDING COMMAND: timeout, STACK:".

The solution is quite easy. The port speed of the LPT port used on the Axis Box is probably set to "Standard". Set it to "High Speed" and timeouts don't occur anymore.

Topics: HP Output Server ↑

Creating external destinations

How do you create an external destination in HP Output Server? I wanted to deliver a print to a destination, that executes an external command on the job. Here's the configuration that works:

I named by logical destination external, which points to queue qexternal and the physical destination is called pdexternal.

The logical destination:

-template-version              : $Date: 2002/03/01 02:35:32 $
-template-component            : false
-managing-server               : jqm_06
-disabled                      : false
-form-feed-default             : false
-maximum-tries-default         : 30000
-printer-name                  : external
-destination-type              : file
-queue-supported               : qexternal
-template-type                 : standard-template
-retention-period-default      : 0
-descriptor                    : test external delivery pathway
-input-document-formats-supported: literal,
                                 ps,
                                 pdf,
                                 text,
                                 frame,
                                 hp-pcl,
                                 afp
-template                      : false
-template-classification       : static-template
-printer-realization           : 0

The queue:

-template-version              : $Date: 2002/01/04 17:24:14 $
-queue-name                    : qexternal
-managing-server               : jqm_06
-disabled                      : false
-template-type                 : standard-template
-descriptor                    : test external delivery pathway
-queue-warning-threshold       : 50
-scheduler-assigned            : Pd-FIFO
-template                      : false
-template-classification       : static-template
-logical-printers-assigned     : {external, 'available'}
-message                       : Enabled
-physical-printers-registered  : pdexternal
-job-order                     :

The physical destination:

-template-version              : $Date: 2002/03/12 19:59:09 $
-template-component            : false
-managing-server               : dsm_06a
-ghostscript-program           : !{dazel-install-directory}!/bin/gs
-disabled                      : false
-form-feed-default             : false
-deliver-program               : /appl/hpos/external.sh
-printer-name                  : pdexternal
-queue-supported               : qexternal
-template-device-name          : generic
-file-overwrite-default        : true
-template-type                 : standard-template
-character-set-default         : iso-latin-1
-physical-device-type          : external
-descriptor                    : test external delivery pathway
-medium-default                : iso-a4-white
-font-directories              : !{dazel-install-directory}!/lib/
                                 FONTS/Soft_Horizons
-template                      : false
-character-sets-supported      : iso-latin-1,
                                 ascii
-template-classification       : static-template
-file-base-directory           : /tmp
-printer-realization           : 1
-printer-connection-mode       : external
-ps-init-directories           : !{dazel-install-directory}!/lib/PS
-fifo-directory                : /appl/hpos/fifo
-write-only-device             : true
-deliver-arguments             : !{fifo-directory}!/!{job-identifier}!

When you've created this delivery pathway, create the FIFO directory (in this case: /appl/hpos/fifo) and restart the accompanying DSM in order to activate the external destination.

The FIFO directory is used by HP Output Server to create FIFO files for every physical destination and for every job submitted to it. You can use these FIFOs to pass information back to HP Output Server.

You might have noticed the physical destination has a -deliver-program attribute. Make sure the user that runs your DSM is capable of executing this command, and the same user-ID must be able to access the FIFO directory.

The deliver-program used is shown below. It logs some information to a file and then sends an email with the contents of the file submitted to external.

#!/bin/ksh

{
echo
echo "Process:"
echo $$
echo "Parameters:"
echo $0 $*
echo "Who am I?"
whoami
} >> /appl/hpos/external.out

file=$2
base=`basename $file`
cp $file /appl/hpos/fifo/$base.$$
mail user@domain.com < $file
fifo=$1
echo "-job-state-on-printer completed" > $1
exit 0

This script passes some information back to the FIFO:-job-state-on-printer completed
This will inform HP Output Server that the processing of the job has completed. If you do not pass this information, HPOS will wait forever for the job to complete, even if the job is no longer active anymore.

Now, if you submit the following command:

# pdpr -dexternal /etc/motd

The file /etc/motd will be mailed, using script external.sh.

Topics: HP Output Server ↑

HPOS maintenance

As HP Output Server doesn't support hot backups, It is recommended to do monthly maintenance, in order to create a backup and keep things healthy. Assuming HPOS is installed in /appl/hpos and its data is stored in /data/hpos:

Notify your users of the downtime.
Stop any applications that use HPOS.
Clear out any "hanging" jobs.
Clear out the /data/hpos/var/tmp directory.
For every log file in /data/hpos/var/log, issue:
# cp /dev/null > $logfile
Delete and recreate the JQMs:
# config_server -d $jqm
# config_server -t JQM $jqm
JQM job databases tend to grow over time. By re-creating them, you can save lots on disk usage and I/O performance. You can do this step safely, as all information is stored in the CM. If you use a server-startup-order, you should also update this again after you've recreated the JQM:
# config_server -u -x"-server-start-order xxx" $jqm
Delete and recreate the DLMs:
# config_server -d $dlm
# nfig_server -t DLM $dlm
This step may be required because sometimes a DLM produces errors, which you might see in JobTracker. These errors are solved by recreating the DLMs. Again, remember to set your server-startup-order, if you use it.
Stop all HPOS daemons:
# stop_server -t all
Create a backup of the system.
If available, download any service packs and install them now:
# . /appl/hpos/etc/setup_env.sh
# perl /appl/hpos/etc/patch.pl
If this fails, just try it again, or you can restore your backup.
Also upgrade any client systems to the same service pack. If necessary, also check the Windows client versions required.
Start all daemons again:
# start_server -t all
Test if everything is working.
Reboot your server to free up memory.
Again, test if everything is working.
Startup all applications again, that use HPOS.
Notify your users that the system is available.
Document any changes (e.g. Service Packs) and problems encountered.
Store the software you've used somewhere safe.

This monthly backup still requires some daily backup method. You can do this, by dumping the HPOS configuration to file, before running your daily backup:

#!/bin/ksh
# Save HPOS config config
. /appl/dazel/etc/setup_env.sh
rm -f /appl/hpos/dump-file
pdconfig -d -c /appl/hpos/dump-file

Topics: HP Output Server ↑

Non-pingable devices

If a device like a printer box, jet-direct card or a Xerox Document Centre does not ping anymore from the HPOS server, it will result in connection timed out errors within HPOS. If this device however does ping from another location within your netwerk, you have a network issue, and its very likely that your network switch simply needs to learn the IP address and/or MAC address of the device. You can do this by sending a packet from the device to the HPOS server. But how do you get this device to send a packet itself, because a ping is not available?

For an Axis Box, go to the Network Settings (which is available from the Admin window). Now fill in the IP address of your HPOS server as the Primary DNS server and click on Ok. Ofcourse DNS resolving will fail (if your HPOS server isn't a DNS server), but quite immediately after clicking Ok, you will notice that the Axis Box will ping again from the HPOS server. If it pings again, change the primary DNS server in your Axis Box back to its original value.

The same can be done with other devices, like the Xerox Document Centre. Go to the TCP/IP properties (usually under Properties -> Connectivity -> Protocols). Set the IP Address Resolution to STATIC (if you're using DHCP) and update the primary DNS server to the IP address of the HPOS server. Then click on Reboot on the Status window, and you'll notice that it will ping again to the HPOS server. Again, change everything back to their original values, if the Document Centre pings.

Topics: Monitoring, PowerHA / HACMP ↑

HACMP Event generation

HACMP provides events, which can be used to most accurately monitor the cluster status, for example via the Tivoli Enterprise Console. Each change in the cluster status is the result of an HACMP event. Each HACMP event has an accompanying notify method that can be used to handle the kind of notification we want.

Interesting Cluster Events to monitor are:

node_up
node_down
network_up
network_down
join_standby
fail_standby
swap_adapter
config_too_long
event_error

You can set the notify method via:

# smitty hacmp
Cluster Configuration
Cluster Resources
Cluster Events
Change/Show Cluster Events

You can also query the ODM:

# odmget HACMPevent

Topics: PowerHA / HACMP ↑

HACMP MAC Address take-over

If you wish to enable MAC Address take-over on an HACMP cluster, you need a virtual MAC address. You can do a couple of things to make sure you have a unique MAC Address on your network:

Use the MAC address of an old system, that you know has been destroyed.
Buy a new network card, use the MAC address, then destroy this card.
Use a DEADBEEF address: (0xdeadbeef1234). This is a non-existent hardware vendor. You might run into problems with someone else making up a deadbeef address, so use this option with caution.

Anyway, register the MAC address you're using for HACMP clusters.

Topics: PowerHA / HACMP ↑

Useful HACMP commands

clstat - show cluster state and substate; needs clinfo.
cldump - SNMP-based tool to show cluster state.
cldisp - similar to cldump, perl script to show cluster state.
cltopinfo - list the local view of the cluster topology.
clshowsrv -a - list the local view of the cluster subsystems.
clfindres (-s) - locate the resource groups and display status.
clRGinfo -v - locate the resource groups and display status.
clcycle - rotate some of the log files.
cl_ping - a cluster ping program with more arguments.
clrsh - cluster rsh program that take cluster node names as argument.
clgetactivenodes - which nodes are active?
get_local_nodename - what is the name of the local node?
clconfig - check the HACMP ODM.
clRGmove - online/offline or move resource groups.
cldare - sync/fix the cluster.
cllsgrp - list the resource groups.
clsnapshotinfo - create a large snapshot of the HACMP configuration.
cllscf - list the network configuration of an HACMP cluster.
clshowres - show the resource group configuration.
cllsif - show network interface information.
cllsres - show short resource group information.
lssrc -ls clstrmgrES - list the cluster manager state.
lssrc -ls topsvcs - show heartbeat information.
cllsnode - list a node centric overview of the hacmp configuration.

Topics: AIX, Networking, PowerHA / HACMP ↑

Specifying the default gateway on a specific interface

When you're using HACMP, you usually have multiple network adapters installed and thus multiple network interface to handle with. If AIX configured the default gateway on a wrong interface (like on your management interface instead of the boot interface), you might want to change this, so network traffic isn't sent over the management interface. Here's how you can do this:

First, stop HACMP or do a take-over of the resource groups to another node; this will avoid any problems with applications when you start fiddling with the network configuration.

Then open up a virtual terminal window to the host on your HMC. Otherwise you would loose the connection, as soon as you drop the current default gateway.

Now you need to determine where your current default gateway is configured. You can do this by typing:

# lsattr -El inet0
# netstat -nr

The lsattr command will show you the current default gateway route and the netstat command will show you the interface it is configured on. You can also check the ODM:

# odmget -q"attribute=route" CuAt

Now, delete the default gateway like this:

# lsattr -El inet0 | awk '$2 ~ /hopcount/ { print $2 }' | read GW
# chdev -l inet0 -a delroute=${GW}

If you would now use the route command to specifiy the default gateway on a specific interface, like this:

# route add 0 [ip address of default gateway: xxx.xxx.xxx.254] -if enX

You will have a working entry for the default gateway. But... the route command does not change anything in the ODM. As soon as your system reboots; the default gateway is gone again. Not a good idea.

A better solution is to use the chdev command:

# chdev -l inet0 -a addroute=net,-hopcount,0,,0,[ip address of default gateway]

This will set the default gateway to the first interface available.

To specify the interface use:

# chdev -l inet0 -a addroute=net,-hopcount,0,if,enX,,0,[ip address of default gateway]

Substitute the correct interface for enX in the command above.

If you previously used the route add command, and after that you use chdev to enter the default gateway, then this will fail. You have to delete it first by using route delete 0, and then give the chdev command.

Afterwards, check fi the new default gateway is properly configured:

# lsattr -El inet0
# odmget -q"attribute=route" CuAt

And ofcourse, try to ping the IP address of the default gateway and some outside address. Now reboot your system and check if the default gateway remains configured on the correct interface. And startup HACMP again!

Topics: LVM, PowerHA / HACMP, System Admin ↑

VGDA out of sync

With HACMP, you can run into the following error during a verification/synchronization:

WARNING: The LVM time stamp for shared volume group: testvg is inconsistent with the time stamp in the VGDA for the following nodes: host01

To correct the above condition, run verification & synchronization with "Automatically correct errors found during verification?" set to either 'Yes' or 'Interactive'. The cluster must be down for the corrective action to run.

This can happen when you've added additional space to a logical volume/file system from the command line instead of using the smitty hacmp menu. But you certainly don't want to take down the entire HACMP cluster to solve this message.

First of all, you don't. The cluster will fail-over nicely anyway, without these VGDA's being in sync. But, still, it is an annoying warning, that you would like to get rid off.

Have a look at your shared logical volumes. By using the lsattr command, you can see if they are actually in sync or not:

host01 # lsattr -Z: -l testlv -a label -a copies -a size -a type -a strictness -Fvalue
/test:1:809:jfs2:y:

host02 # lsattr -Z: -l testlv -a label -a copies -a size -a type -a strictness -Fvalue
/test:1:806:jfs2:y:

Well, there you have it. One host reports testlv having a size of 806 LPs, the other says it's 809. Not good. You will run into this when you've used the extendlv and chfs commands to increase the size of a shared file system. You should have used the smitty menu.

The good thing is, HACMP will sync the VGDA's if you do some kind of logical volume operation through the smitty hacmp menu. So, either increase the size of a shared logical volume through the smitty menu with just one LP (and of course, also increase the size of the corresponding file system); Or, you can create an additional shared logical volume through smitty of just one LP, and then remove it again afterwards.

When you've done that, simply re-run the verification/synchronization, and you'll notice that the warning message is gone. Make sure you run the lsattr command again on your shared logical volumes on all the nodes in your cluster to confirm.

Topics: Monitoring, PowerHA / HACMP ↑

HACMP auto-verification

HACMP automatically runs a verification every night, usually around mid-night. With a very simple command you can check the status of this verification run:

# tail -10 /var/hacmp/log/clutils.log 2>/dev/null|grep detected|tail -1

If this shows a returncode of 0, the cluster verification ran without any errors. Anything else, you'll have to investigate. You can use this command on all your HACMP clusters, allowing you to verify your HACMP cluster status every day.

With the following smitty menu you can change the time when the auto-verification runs and if it should produce debug output or not:

# smitty clautover.dialog

You can check with:

# odmget HACMPcluster
# odmget HACMPtimersvc

Be aware that if you change the runtime of the auto-verification that you have to synchronize the cluster afterwards to update the other nodes in the cluster.

Number of results found: 470.
Displaying results: 351 - 360.

Order

No time to lose? Need to know what's wrong with
your UNIX system now? Then get started TODAY!