Tech Blog

These are blog entries written by the UNIX Health Check development team. Our team has extensive technical experience on both AIX and Red Hat systems, and we like to share our knowledge with our visitors.

Topics: AIX, System Admin

Why AIX Memory Typically Runs Near 100% Utilization

Memory utilization on AIX systems typically runs around 100%. This is often a source of concern. However, high memory utilization in AIX does not imply the system is out of memory. By design, AIX leaves files it has accessed in memory. This significantly improves performance when AIX reaccesses these files because they can be reread directly from memory, not disk. When AIX needs memory, it discards files using a "least used" algorithm. This generates no I/O and has almost no performance impact under normal circumstances.

Sustained paging activity is the best indication of low memory. Paging activity can be monitored using the "vmstat" command. If the "page-in" (PI) and "page-out" (PO) columns show non-zero values over "long" periods of time, then the system is short on memory. (All systems will show occasional paging, which is not a concern.)

Memory requirements for applications can be empirically determined using the AIX "rmss"command. The "rmss" command is a test tool that dynamically reduces usable memory. The onset of paging indicates an application's minimum memory requirement.

Finally, the "svmon" command can be used to list how much memory is used each process. The interpretation of the svmon output requires some expertise. See the AIX documentation for details.

To test the performance gain of leaving a file in memory, a 40MB file was read twice. The first read was from disk, the second was from memory. The first read took 10.0 seconds. The second read took 1.3 second: a 7.4x improvement.

Topics: AIX, Storage, System Admin

Using NFS

The Networked File System (NFS) is one of a category of filesystems known as distributed filesystems. It allows users to access files resident on remote systems without even knowing that a network is involved and thus allows filesystems to be shared among computers. These remote systems could be located in the same room or could be miles away.

In order to access such files, two things must happen. First, the remote system must make the files available to other systems on the network. Second, these files must be mounted on the local system to be able to access them. The mounting process makes the remote files appear as if they are resident on the local system. The system that makes its files available to others on the network is called a server, and the system that uses a remote file is called a client.

NFS Server

NFS consists of a number of components including a mounting protocol, a file locking protocol, an export file and daemons (mountd, nfsd, biod, rpc.lockd, rpc.stad) that coordinate basic file services.

Systems using NFS make the files available to other systems on the network by "exporting" their directories to the network. An NFS server exports its directories by putting the names of these directories in the /etc/exports file and executing the exportfs command. In its simplest form, /etc/exports consists of lines of the form:

pathname -option, option ...
Where pathname is the name of the file or directory to which network access is to be allowed; if pathname is a directory, then all of the files and directories below it within the same filesystem are also exported, but not any filesystems mounted within it. The next fields in the entry consist of various options that specify the type of access to be given and to whom. For example, a typical /etc/exports file may look like this:
/cyclop/users    -access=homer:bart, root=homer
/usr/share/man   -access=marge:maggie:lisa
/usr/mail
This export file permits the filesystem /cyclops/users to be mounted by homer and bart, and allows root access to it from homer. In addition, it lets /usr/share/man to be mounted by marge, maggie and lisa. The filesystem /usr/mail can be mounted by any system on the network. Filesystems listed in the export file without a specific set of hosts are mountable by all machines. This can be a sizable security hole.

When used with the -a option, the exportfs command reads the /etc/exports file and exports all the directories listed to the network. This is usually done at system startup time.
# exportfs -va
If the contents of /etc/exports change, you must tell mountd to reread it. This can be done by re-executing the exportfs command after the export file is changed.

The exact attributes that can be specified in the /etc/exports file vary from system to system. The most common attributes are:
  • -access=list : Colon-separated list of hostnames and netgroups that can mount the filesystem.
  • -ro : Export read-only; no clients may write on the filesystem.
  • -rw=list : List enumerates the hosts allowed to mount for writing; all others must mount read-only.
  • -root=list : Lists hosts permitted to access the filesystem as root. Without this option, root access from a client is equivalent to access by the user nobody (usually UID -1).
  • -anon : Specifies UID that should be used for requests coming from an unknown user. Defaults to nobody.
  • -hostname : Allow hostname to mount the filesystem.
For example:
/cyclop/users -rw=moe,anon=-1 /usr/inorganic -ro
This allows moe to mount /cyclop/users for reading and writing, and maps anonymous users (users from other hosts that do not exist on the local system and the root user from any remote system) to the UID -1. This corresponds to the nobody account, and it tells NFS not to allow such users access to anything.

NFS Clients

After the files, directories and/or filesystems have been exported, an NFS client must explicitly mount them before it can use them. It is handled by the mountd daemon (sometimes called rpc.mountd). The server examines the mount request to be sure the client has proper authorization.

The following syntax is used for the mount command. Note that the name of the server is followed by a colon and the directory to be mounted:
# mount server1:/usr/src /src
Here, the directory structure /usr/src resident on the remote system server1 is mounted on the /src directory on the local system.

When the remote filesystem is no longer needed, it is unmounted with the umount:
# umount server1:/usr/src
The mount command can be used to establish temporary network mounts, but mounts that are part of a system's permanent configuration should be either listed in /etc/filesystems (for AIX) or handled by an automatic mounting service such as automount or amd.

NFS Commands
  • lsnfsexp : Displays the characteristics of directories that are exported with the NFS.
    # lsnfsexp
    software -ro
    
  • mknfsexp -d path -t ro : Exports a read-only directory to NFS clients and add it to /etc/exports.
    # mknfsexp -d /software -t ro
    /software ro
    Exported /software
    # lsnfsexp
    /software -ro
    
  • rmnfsexp -d path : Unexports a directory from NFS clients and remove it from /etc/exports.
    # rmnfsexp -d /software
    
  • lsnfsmnt : Displays the characteristics of NFS mountable file systems.
  • showmount -e : List exported filesystems.
    # showmount -e
    export list for server:
    /software (everyone)
    
  • showmount -a : List hosts that have remotely mounted local systems.
    # showmount  -a
    server2:/sourcefiles
    server3:/datafiles
    
Start/Stop/Status NFS daemons

In the following discussion, reference to daemon implies any one of the SRC-controlled daemons (such as nfsd or biod).

The NFS daemons can be automatically started at system (re)start by including the /etc/rc.nfs script in the /etc/inittab file.

They can also be started manually by executing the following command:
# startsrc -s Daemon or startsrc -g nfs
Where the -s option will start the individual daemons and -g will start all of them.

These daemons can be stopped one at a time or all at once by executing the following command:
# stopsrc -s Daemon or stopsrc -g nfs
You can get the current status of these daemons by executing the following commands:
# lssrc -s [Daemon]
# lssrc -a
If the /etc/exports file does not exist, the nfsd and the rpc.mountd daemons will not start. You can get around this by creating an empty /etc/exports file. This will allow the nfsd and the rpc.mountd daemons to start, although no filesystems will be exported.

Topics: AIX, Storage, System Admin

Working with disks

With the passing time, some devices are added, and some are removed from a system. AIX learns about hardware changes when the root user executes the cfgmgr command. Without any attributes, it scans all buses for any attached devices. Information acquired by cfgmgr is stored in the ODM (Object Database Manager). Cfgmgr only discovers new devices. Removing devices is achieved with rmdev or odmdelete. Cfgmgr can be executed in the quiet (cfgmgr) or verbose (cfgmgr -v) mode. It can be directed to scan all or selected buses.

The basic command to learn about disks is lspv. Executed without any parameters, it will generate a listing of all disks recorded in the ODM, for example:

# lspv
hdisk0     00c609e0a5ec1460         rootvg     active
hdisk1     00c609e037478aad         rootvg     active
hdisk4     00c03c8a14fa936b         abc_vg     active
hdisk2     00c03b1a32e50767         None
hdisk3     00c03b1a32ee4222         None
hdisk5     00c03b1a35cdcdf0         None
Each row describes one disk. The first column shows its name followed by the PVID and the volume group it belongs to. "None" in the last column indicates that the disk does not belong to any volume group. "Active" in the last column indicates, that the volume group is varied on. Existence of a PVID indicates possibility of presence of data on the disk. It is possible that such disk belongs to a volume group which is varied off.

Executing lspv with a disk name generates information only about this device:
# lspv hdisk4
PHYSICAL VOLUME:   hdisk4                 VOLUME GROUP:    abc_vg
PV IDENTIFIER:     00c03c8a14fa936b       VG IDENTIFIER:   00c03b1a000
PV STATE:          active
STALE PARTITIONS:  0                      ALLOCATABLE:     yes
PP SZE:           16 megabyte(s)         LOGICAL VOLUMES: 2
TOTAL PPs:         639 (10224 megabytes)  VG DESCRIPTORS:  2
FREE PPs:          599 (9584 megabytes)   HOT SPARE:       no
USED PPs:          40 (640 megabytes)     MAX REQUEST:     256 kb
FREE DISTRIBUTION: 128..88..127..128..128
USED DISTRIBUTION: 00..40..00..00..00
In the case of hdisks, we are able to determine its size, the number of logical volumes (two), the number of physical partitions in need of synchronization (Stale Partitions) and the number of VGDA's. Executing lspv against a disk without a volume group membership does nothing useful:
# lspv hdisk2
0516-304: Unable to find device id hdisk2 in the Device 
configuration database
How do you establish the capacity of a disk that does not belong to a volume group? The next command provides this in megabytes:
# bootinfo -s hdisk2
10240
The same (and much more) information can be retrieved by executing lsattr -El hdisk#:
# lsattr -El hdisk0
PCM             PCM/scsiscsd      Path Control Module   False
algorithm       fail_over         Algorithm             True
dist_err_pcnt   0                 Distributed Error %   True
dist_tw_width   50                Sample Time           True
hcheck_interval 0                 Health Check Interval True
hcheck_mode     nonactive         Health Check Mode     True
max_transfer    0x40000           Maximum TRANSFER Size True
pvid            00c609e0a5ec1460  Volume identifier     False
queue_depth     3                 Queue DEPTH           False
reserve_policy  single_path       Reserve Policy        True
size_in_mb      73400             Size in Megabytes     False
unique_id       26080084C1AF0FHU  Unique identifier     False
The last command can be limited to show only the size if executed as shown:
# lsattr -El hdisk0 -a size_in_mb
size_in_mb 73400 Size in Megabytes False
A disk can get a PVID in one of two ways: by the virtue of membership in a volume group (when running extendvg or mkvg commands) or as the result of execution of the chdev command. Command lqueryvg helps to establish if there is data on the disk or not.
# lqueryvg -Atp hdisk2
0516-320 lqueryvg: hdisk2 is not assigned to a volume group.
Max LVs:        256
PP Size:        26
Free PPs:       1117
LV count:       0
PV count:       3
Total VGDAs:    3
Conc Allowed:   0
MAX PPs per PV  1016
MAX PVs:        32
Quorum (disk):  1
Quorum (dd):    1
Auto Varyon ?:  1
Conc Autovaryo  0
Varied on Conc  0
Physical:       00c03b1a32e50767   1   0
                00c03b1a32ee4222   1   0
                00c03b1a9db2f183   1   0
Total PPs:      1117
LTG size:       128
HOT SPARE:      0
AUTO SYNC:      0
VG PERMISSION:  0
SNAPSHOT VG:    0
IS_PRIMARY VG:  0
PSNFSTPP:       4352
VARYON MODE:    ???????
VG Type:        0
Max PPs:        32512
This disk belongs to a volume group that had three disks:
PV count: 3
Their PVIDs are:
Physical:       00c03b1a32e50767   1   0
                00c03b1a32ee4222   1   0
                00c03b1a9db2f183   1   0
At this time, it does not have any logical volumes:
LV count: 0
It is easy to notice that a disk belongs to a volume group. Logical volume names are the best proof of this. To display data stored on a disk you can use the command lquerypv.

A PVID can be assigned to or removed from a disk if it does not belong to a volume group, by executing the command chdev.
# chdev -l hdisk2 -a pv=clear
hdisk2 changed
lspv | grep hdisk2
hdisk2          none         None
Now, let's give the disk a new PVID:
# chdev -l hdisk2 -a pv=yes
hdisk2 changed
# lspv | grep hdisk2
hdisk2          00c03b1af578bfea    None
At times, it is required to restrict access to a disk or to its capacity. You can use command chpv for this purpose. To prevent I/O to access to a disk:
# chpv -v r hdisk2
To allow I/O:
# chpv -v a hdisk2
I/O on free PPs is not allowed:
# chpv -a n hdisk2
I/O on free PPs is allowed:
# chpv -a y hdisk2
AIX was created years ago, when disks were very expensive. I/O optimization, the decision what part of data will be read/written faster than other data, was determined by its position on the disk. Between I/O, disk heads are parked in the middle. Accordingly, the fastest I/O takes place in the middle. With this in mind, a disk is divided into five bands called: outer, outer-middle, center, inner and inner-edge. This method of assigning physical partitions (logical volumes) as the function of a band on a disk, is called the intra-physical policy. This policy and the policy defining the spread of logical volume on disks (inter-physical allocation policy) gains importance while creating logical volumes.

Disk topology, the range of physical partitions on each band is visualized with command lsvg -p vg_name and lspv hdisk#. Note the last two lines of the lspv:
FREE DISTRIBUTION:  128..88..127..128..128
USED DISTRIBUTION:  00..40..00..00..00
The row labeled FREE DISTRIBUTION shows the number of free PPs in each band. The row labeled USED DISTRIBUTION shows the number of used PPs in each band. As you can see, some bands of this disk have no data. Presently, this policy lost its meaning as even the slowest disks are much faster then their predecesors. In the case of RAID or SAN disks, this policy has no meaning at all. For those who still use individual SCSI or SSA disks, it is good to remember that the data closer to the outer edge is read/written the slowest.

To learn what logical volumes are located on a given disk, you can execute command lspv -l hdisk#. The reversed relation is established executing lslv -M lv_name.

It is always a good idea to know what adapter and what bus any disk is attached to. Otherwise, if one of the disks breaks, how will you know which disk needs to be removed and replaced? AIX has many commands that can help you. It is customary to start from the adapter, to identify all adapters known to the kernel:
# lsdev -Cc adapter | grep -i scsi
scsi0   Available 1S-08    Wide/Ultra-3 SCSI I/O Controller
scsi1   Available 1S-09    Wide/Ultra-3 SCSI I/O Controller
scsi2   Available 1c-08    Wide/Fast-20 SCSI I/O Controller
The last command produced information about SCSI adapters present during the last execution of the cfgmgr command. This output allows you to establish in what drawer the adapter is located as well. The listing, tells us that there are three SCSI adapters. The second colums shows the device state (Available: ready to be used; Defined: device needs further configuration). The next column shows its location (drawer/bus). The last column contains a short description. Executing the last command against a disk from rootvg produces:
# lsdev -Cc disk -l hdisk0
hdisk0 Available 1S-08-00-8,0 16 Bit LVD SCSI Disk Drive
From both outputs we can determine what SCSI adapter controls this disk - scsi0. Also, we see that disk has SCSI ID 8,0. How to determine the type/model/capacity/part number, etc?
# lscfg -vl hdisk0
  hdisk0  U0.1-P2/Z1-A8  16 Bit LVD SCSI Disk Drive (36400 MB)

        Manufacturer................IBM
        Machine Type and Model......IC35L036UCDY10-0
        FRU Number..................00P3831
        ROS Level and ID............53323847
        Serial Number...............E3WP58EC
        EC Level....................H32224
        Part Number.................08K0293
        Device Specific.(Z0)........000003029F00013A
        Device Specific.(Z1)........07N4972
        Device Specific.(Z2)........0068
        Device Specific.(Z3)........04050
        Device Specific.(Z4)........0001
        Device Specific.(Z5)........22
        Device Specific.(Z6)........
You can get more details by executing command: lsattr -El hdisk0.

Topics: AIX, Security, System Admin

How to show the timestamp in your shell history in AIX 5.3

The environment variable EXTENDED_HISTORY in AIX will timestamp your shell history. In ksh, you set it as follows:

# export EXTENDED_HISTORY=ON
A good practice is to set this variable in /etc/environment.

To view your history:
# history
888 ? :: cd aix_auth/
889 ? :: vi server
890 ? :: ldapsearch
891 ? :: fc -lt
892 ? :: fc -l
NOTE: before setting this environment variable, the previous commands in your history will have a question mark in the timestamp field.

If you use the fc command, you will have to use the "-t" option to see the timestamp:
# fc -t

Topics: AIX, EMC, PowerHA / HACMP, SAN, Storage, System Admin

Missing disk method in HACMP configuration

Issue when trying to bring up a resource group: For example, the hacmp.out log file contains the following:

cl_disk_available[187] cl_fscsilunreset fscsi0 hdiskpower1 false cl_fscsilunreset[124]: openx(/dev/hdiskpower1, O_RDWR, 0, SC_NO_RESERVE): Device busy cl_fscsilunreset[400]: ioctl SCIOLSTART id=0X11000 lun=0X1000000000000 : Invalid argument
To resolve this, you will have to make sure that the SCSI reset disk method is configured in HACMP. For example, when using EMC storage:

Make sure emcpowerreset is present in /usr/lpp/EMC/Symmetrix/bin/emcpowerreset.

Then add new custom disk method:
  • Enter into the SMIT fastpath for HACMP "smitty hacmp".
  • Select Extended Configuration.
  • Select Extended Resource Configuration.
  • Select HACMP Extended Resources Configuration.
  • Select Configure Custom Disk Methods.
  • Select Add Custom Disk Methods.
      Change/Show Custom Disk Methods

Type or select values in entry fields.
Press Enter AFTER making all desired changes.

                                                 [Entry Fields]
* Disk Type (PdDvLn field from CuDv)             disk/pseudo/power
* New Disk Type                                  [disk/pseudo/power]
* Method to identify ghost disks                 [SCSI3]
* Method to determine if a reserve is held       [SCSI_TUR]
* Method to break reserve [/usr/lpp/EMC/Symmetrix/bin/emcpowerreset]
  Break reserves in parallel                     true
* Method to make the disk available              [MKDEV]

Topics: AIX, System Admin

How to run background jobs

There are a couple of options for running background jobs:

Option one:

Start the job as normal, then press CTRL-Z. It will say it is stopped, and then type "bg". It will continue in the background. Then type "fg", if you want it to run in the foreground again. You can repeat typing CTRL-Z, bg, fg as much as you like. The process will be killed once you log out. You can avoid this by running: nohup command.

Option two:

Use the at command: run the command as follows:

# echo "command" | at now
This will start it in the background and it will keep on running even if you log out.

Option three:

Run it with an ampersand: command &

This will run it in the background. But the process will be killed if you log out. You can avoid the process being killed by running: nohup command &.

Option four:

Schedule it one time in the crontab.

With all options, make sure you redirect any output and errors to a file, like:
# command > command.out 2>&1

Topics: AIX, System Admin

The creation date of a UNIX file

UNIX doesn't store a file creation timestamp in the inode information. The timestamps recorded are the last access timestamp, the last modified timestamp and the last changed timestamp (which is the last change to the inode information). When a file is brand new, the last modified timestamp will be the creation timestamp of the file, but that piece of information is lost as soon as the file is modified in any way.

To get this information, use the istat command, for example for the /etc/rc.tcpip file:

# ls -li /etc/rc.tcpip
 8247 -rwxrwxr-- 1 root  system   6607 Jan 06 06:25 /etc/rc.tcpip
Now you know the inode number: 8247.
# istat /etc/rc.tcpip
Inode 8247 on device 10/4       File
Protection: rwxrwxr--
Owner: 0(root)          Group: 0(system)
Link count:   1         Length 6607 bytes

Last updated:   Wed Jan  6 06:25:49 PST 2010
Last modified:  Wed Jan  6 06:25:49 PST 2010
Last accessed:  Tue May  4 14:00:37 PDT 2010
The same type of information can be found using the fsdb command. Start the fsdb command with the file system where the file is located; in the example below the root file system. Then type the number of the inode, followed by "i":
# fsdb /
File System:                                /
File System Size:                      2097152  (512 byte blocks)
Disk Map Size:                              20  (4K blocks)
Inode Map Size:                             38  (4K blocks)
Fragment Size:                            4096  (bytes)
Allocation Group Size:                    2048  (fragments)
Inodes per Allocation Group:              4096
Total Inodes:                           524288
Total Fragments:                        262144

8247i
i#:   8247  md: f---rwxrwxr--  ln:    1  uid:    0  gid:    0
szh:        0  szl:     6607  (actual size:     6607)
a0: 0x1203      a1: 0x1204      a2: 0x00        a3: 0x00
a4: 0x00        a5: 0x00        a6: 0x00        a7: 0x00
at: Tue May 04 14:00:37 2010
mt: Wed Jan 06 06:25:49 2010
ct: Wed Jan 06 06:25:49 2010

Topics: AIX, System Admin

AIX Multiple page size support

To list the supported page sizes on a system:

# pagesize -a
4096
65536
16777216
17179869184
# pagesize -af
4K
64K
16M
16G
To learn more about the multiple page size support for AIX, please read the related whitepaper here.

Topics: AIX, System Admin

UNKNOWN_ user in /etc/security/failedlogin

An "unknown" entry appears when somebody tried to log on with a user id which is not known to the system. It would be possible to show the userid they attempted to use, but this is not done as a common mistake is to enter the password instead of the userid. If this was recorded it would be a security risk.

Topics: AIX, System Admin

Calculating with UNIX timestamps

Starting with AIX 5.3, you can use the following command to get the number of seconds since the UNIX EPOCH (January 1st, 1970):

# date +"%s"
On older AIX versions, or other UNIX operating systems, you may want to use the following command to get the same answer:
# perl -MPOSIX -le 'print time'
Getting this UNIX timestamp can be very useful when doing calculations with time stamps. If you need to convert a UNIX timestamp back to something readable:
now=`perl -MPOSIX -le 'print time'`
# 3 months ago =
# 30 days * 3 months * 24 hours * 60 minutes * 60 seconds =
# 7776000 seconds.
let threemonthsago="${now}-7776000"
perl -MPOSIX -le "print scalar(localtime($threemonthsago))"

Number of results found for topic System Admin: 249.
Displaying results: 151 - 160.