Topics: AIX, Storage, System Admin
Erasing disks
During a system decommission process, it is advisable to format or at least erase all drives. There are 2 ways of accomplishing that:
If you have time:
AIX allows disks to be erased via the Format media service aid in the AIX diagnostic package. To erase a hard disk, run the following command:
# diag -T formatThis will start the Format media service aid in a menu driven interface. If prompted, choose your terminal. You will then be presented with a resource selection list. Choose the hdisk devices you want to erase from this list and commit your changes according to the instructions on the screen.
Once you have committed your selection, choose Erase Disk from the menu. You will then be asked to confirm your selection. Choose Yes. You will be asked if you want to Read data from drive or Write patterns to drive. Choose Write patterns to drive. You will then have the opportunity to modify the disk erasure options. After you specify the options you prefer, choose Commit Your Changes. The disk is now erased. Please note, that it can take a long time for this process to complete.
If you want to do it quick-and-dirty:
For each disk, use the dd command to overwrite the data on the disk. For example:
for disk in $(lspv | awk '{print $1}') ; do
dd if=/dev/zero of=/dev/r${disk} bs=1024 count=10
echo $disk wiped
done
This does the trick, as it reads zeroes from /dev/zero and outputs 10 times 1024 zeroes to each disk. That overwrites anything on the start of the disk, rendering the disk useless.When removing a device on AIX, you may run into a message saying that a child device is not in a correct state. For example:
To determine what the child devices are, use the -p option of the lsdev command. From the man page of the lsdev command:# rmdev -dl fcs3 Method error (/usr/lib/methods/ucfgcommo): 0514-029 Cannot perform the requested function because a child device of the specified device is not in a correct state.
-p Parent
Specifies the device logical name from the Customized Devices
object class for the parent of devices to be displayed. The
-p Parent flag can be used to show the child devices of the
given Parent. The Parent argument to the -p flag may contain
the same wildcard charcters that can be used with the odmget
command. This flag cannot be used with the -P flag.
For example:
To remove the device, and all child devices, use the -R option. From the man page for the rmdev command:# lsdev -p fcs3 fcnet3 Defined 07-01-01 Fibre Channel Network Protocol Device fscsi3 Available 07-01-02 FC SCSI I/O Controller Protocol Device
-R
Unconfigures the device and its children.
When used with the -d or -S flags, the
children are undefined or stopped, respectively.
The command to remove adapter fcs3 and all child devices, will be:
# rmdev -Rdl fcs3
Topics: AIX, Security, System Admin↑
mkpasswd
An interesting open source project is Expect. It's a tool that can be used to automate interactive applications.
You can download the RPM for Expect can be downloaded from
http://www.perzl.org/aix/index.php?n=Main.Expect, and the home page for Expect is http://www.nist.gov/el/msid/expect.cfm.
A very interesting tool that is part of the Expect RPM is "mkpasswd". It is a little Tcl script that uses Expect to work with the passwd program to generate a random password and set it immediately. A somewhat adjusted version of "mkpasswd" can be downloaded here. The adjusted version of mkpasswd will generate a random password for a user, with a length of 8 characters (the maximum password length by default for AIX), if you run for example:
To see the interactive work performed by Expect for mkpasswd, use the -v option:# /usr/local/bin/mkpasswd username sXRk1wd3
By using mkpasswd, you'll never have to come up with a random password yourself again, and it will prevent Unix system admins from assigning new passwords to accounts that are easily guessible, such as "changeme", or "abc1234".# /usr/local/bin/mkpasswd -v username spawn /bin/passwd username Changing password for "username" username's New password: Enter the new password again: password for username is s8qh1qWZ
Now, what if you would want to let "other" users (non-root users) to run this utility, and at the same time prevent them from resetting the password of user root?
Let's say you want user pete to be able to reset other user's passwords. Add the following entries to the /etc/sudoers file by running visudo:
# visudo
Cmnd_Alias MKPASSWD = /usr/local/bin/mkpasswd, \
! /usr/local/bin/mkpasswd root
pete ALL=(ALL) NOPASSWD:MKPASSWD
This will allow pete to run the /usr/local/bin/mkpasswd utility, which he can use to reset passwords.
First, to check what he can run, use the "sudo -l" command:
Then, an attempt, using pete's account, to reset another user's password (which is successful):# su - pete $ sudo -l User pete may run the following commands on this host: (ALL) NOPASSWD: /usr/local/bin/mkpasswd, !/usr/local/bin/mkpasswd root
Then another attempt, to reset the root password (which fails):$ sudo /usr/local/bin/mkpasswd mark oe09'ySMj
$ sudo /usr/local/bin/mkpasswd root Sorry, user pete is not allowed to execute '/usr/local/bin/mkpasswd root' as root.
Since the files involved in the following procedure are flat ASCII files and their format has not changed from V4 to V5, the users can be migrated between systems running the same or different versions of AIX (for example, from V4 to V5).
Files that can be copied over:
- /etc/group
- /etc/passwd
- /etc/security/group
- /etc/security/limits
- /etc/security/passwd
- /etc/security/.ids
- /etc/security/environ
- /etc/security/.profile
root:!:0:0::/:/usr/bin/kshWhen you copy the /etc/passwd and /etc/group files, make sure they contain at least a minimum set of essential user and group definitions.
Listed specifically as users are the following:
root, daemon, bin, sys, adm, uucp, guest, nobody, lpd
Listed specifically as groups are the following:
system, staff, bin, sys, adm, uucp, mail, security, cron, printq, audit, ecs, nobody, usr
If the bos.compat.links fileset is installed, you can copy the /etc/security/mkuser.defaults file over. If it is not installed, the file is located as mkuser.default in the /usr/lib/security directory. If you copy over mkuser.defaults, changes must be made to the stanzas. Replace group with pgrp, and program with shell. A proper stanza should look like the following:
user:
pgrp = staff
groups = staff
shell = /usr/bin/ksh
home = /home/$USER
The following files may also be copied over, as long as the AIX version in the new machine is the same:
- /etc/security/login.cfg
- /etc/security/user
Once the files are moved over, execute the following:
This will clear up any discrepancies (such as uucp not having an entry in /etc/security/passwd). Ideally this should be run on the source system before copying over the files as well as after porting these files to the new system.# usrck -t ALL # pwdck -t ALL # grpck -t ALL
NOTE: It is possible to find user ID conflicts when migrating users from older versions of AIX to newer versions. AIX has added new user IDs in different release cycles. These are reserved IDs and should not be deleted. If your old user IDs conflict with the newer AIX system user IDs, it is advised that you assign new user IDs to these older IDs.
From: http://www-01.ibm.com/support/docview.wss?uid=isg3T1000231
This error can occur if the fibre channel adapter is extremely busy. The AIX FC adapter driver is trying to map an I/O buffer for DMA access, so the FC adapter can read or write into the buffer. The DMA mapping is done by making a request to the PCI bus device driver.
The PCI bus device driver is saying that it can't satisfy the request right now. There was simply too much IO at that moment, and the adapter couldn't handle them all. When the FC adapter is configured, we tell the PCI bus driver how much resource to set aside for us, and it may have gone over the limit. It is therefore recommended to increase the max_xfer_size on the fibre channel devices.
It depends on the type of fibre channel adapter, but usually the possible sizes are:
0x100000, 0x200000, 0x400000, 0x800000, 0x1000000
To view the current setting type the following command:
# lsattr -El fcsX -a max_xfer_sizeReplace the X with the fibre channel adapter number.
You should get an output similar to the following:
max_xfer_size 0x100000 Maximum Transfer Size TrueThe value can be changed as follows, after which the server needs to be rebooted:
# chdev -l fcsX -a max_xfer_size=0x1000000 -P
The following is a description of how you can set up a private network between two VIO clients on one hardware frame.
Servers to set up connection: server1 and server2
Purpose: To be used for Oracle interconnect (for use by Oracle RAC/CRS)
IP Addresses assigned by network team:
VLAN to be set up: PVID 4. This number is basically randomly chosen; it could have been 23 or 67 or whatever, as long as it is not yet in use. Proper documentation of your VIO setup and the defined networks, is therefore important.192.168.254.141 (server1priv) 192.168.254.142 (server2priv) Subnetmask: 255.255.255.0
Steps to set this up:
- Log in to HMC GUI as hscroot.
- Change the default profile of server1, and add a new virtual Ethernet adapter. Set the port virtual Ethernet to 4 (PVID 4). Select "This adapter is required for virtual server activation". Configuration -> Manage Profiles -> Select "Default" -> Actions -> Edit -> Select "Virtual Adapters" tab -> Actions -> Create Virtual Adapter -> Ethernet adapter -> Set "Port Virtual Ethernet" to 4 -> Select "This adapter is required for virtual server activation." -> Click Ok -> Click Ok -> Click Close.
- Do the same for server2.
- Now do the same for both VIO clients, but this time do "Dynamic Logical Partitioning". This way, we don't have to restart the nodes (as we previously have only updated the default profiles of both servers), and still get the virtual adapter.
- Run cfgmgr on both nodes, and see that you now have an extra Ethernet adapter, in my case ent1.
- Run "lscfg -vl ent1", and note the adapter ID (in my case C5) on both nodes. This should match the adapter IDs as seen on the HMC.
- Now configure the IP address on this interface on both nodes.
- Add the entries for server1priv and server2priv in /etc/hosts on both nodes.
- Run a ping: ping server2priv (from server1) and vice versa.
- Done!
- On each node: deconfigure the en1 interface:
# ifconfig en1 detach
- Rmdev the devices on each node:
# rmdev -dl en1 # rmdev -dl ent1
- Remove the virtual adapter with ID 5 from the default profile in the HMC GUI for server1 and server2.
- DLPAR the adapter with ID 5 out of server1 and server2.
- Run cfgmgr on both nodes to confirm the adapter does not re-appear. Check with:
# lsdev -Cc adapter
- Done!
If clstat is not working, you may get the following error, when running clstat:
To resolve this, first of all, go ahead and read the README that is referred to. You'll find that you have to enable an entry in /etc/snmdv3.conf:# clstat Failed retrieving cluster information. There are a number of possible causes: clinfoES or snmpd subsystems are not active. snmp is unresponsive. snmp is not configured correctly. Cluster services are not active on any nodes. Refer to the HACMP Administration Guide for more information. Additional information for verifying the SNMP configuration on AIX 6 can be found in /usr/es/sbin/cluster/README5.5.0.UPDATE
Commands clstat or cldump will not start if the internet MIB tree is not enabled in snmpdv3.conf file. This behavior is usually seen in AIX 6.1 onwards where this internet MIB entry was intentionally disabled as a security issue. This internet MIB entry is required to view/resolve risc6000clsmuxpd (1.3.6.1.4.1.2.3.1.2.1.5) MIB sub tree which is used by clstat or cldump functionality.Sometimes, even after doing this, clstat or cldump still don't work. Make sure that a COMMUNITY entry is present in /etc/snmpdv3.conf:
There are two ways to enable this MIB sub tree (risc6000clsmuxpd). They are:
1) Enable the main internet MIB entry by adding this line in /etc/snmpdv3.conf file:
VACM_VIEW defaultView internet - included -
But doing so is not recommended, as it unlocks the entire MIB tree.
2) Enable only the MIB sub tree for risc6000clsmuxpd without enabling the main MIB tree by adding this line in /etc/snmpdv3.conf file.
VACM_VIEW defaultView 1.3.6.1.4.1.2.3.1.2.1.5 - included -
Note: After enabling the MIB entry above snmp daemon must be restarted with the following commands as shown below:
# stopsrc -s snmpd
# startsrc -s snmpd
After snmp is restarted leave the daemon running for about two minutes before attempting to start clstat or cldump.
COMMUNITY public plubic noAuthNoPriv 0.0.0.0 0.0.0.0 -The next thing may sound silly, but edit the /etc/snmpdv3.conf file, and take out the coments. Change this:
To:smux 1.3.6.1.4.1.2.3.1.2.1.2 gated_password # gated smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password # HACMP/ES for AIX ...
Then, recycle the deamons on all cluster nodes. This can be done while the cluster is up and running:smux 1.3.6.1.4.1.2.3.1.2.1.2 gated_password smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password
Now, to verify that it works, run either clstat or cldump, or the following command:# stopsrc -s hostmibd # stopsrc -s snmpmibd # stopsrc -s aixmibd # stopsrc -s snmpd # sleep 4 # chssys -s hostmibd -a "-c public" # chssys -s aixmibd -a "-c public" # chssys -s snmpmibd -a "-c public" # sleep 4 # startsrc -s snmpd # startsrc -s aixmibd # startsrc -s snmpmibd # startsrc -s hostmibd # sleep 120 # stopsrc -s clinfoES # startsrc -s clinfoES # sleep 120
# snmpinfo -m dump -v -o /usr/es/sbin/cluster/hacmp.defs clusterStill not working at this point? Then run an Extended Verification and Synchronization:
# smitty cm_ver_and_sync.selectAfter that, clstat, cldump and snmpinfo should work.
Topics: AIX, System Admin↑
Too many open files
To determine if the number of open files is growing over a period of time, issue lsof to report the open files against a PID on a periodic basis. For example:
# lsof -p (PID of process) -r (interval) > lsof.outNote: The interval is in seconds, 1800 for 30 minutes.
This output does not give the actual file names to which the handles are open. It provides only the name of the file system (directory) in which they are contained. The lsof command indicates if the open file is associated with an open socket or a file. When it references a file, it identifies the file system and the inode, not the file name.
Run the following command to determine the file name:
# df -kP filesystem_from_lsof | awk '{print $6}' | tail -1Now note the filesystem name. And then run:
# find filesystem_name -inum inode_from_lsof -printThis will show the actual file name.
To increase the number, change or add the nofiles=XXXXX parameter in the /etc/security/limits file, run:
# chuser nofiles=XXXXX user_idYou can also use svmon:
# svmon -P java_pid -m | grep persThis lists opens files in the format: filesystem_device:inode. Use the same procedure as above for finding the actual file name.
If you try to estabilish a dsh session with a remote node sometimes you may get an error message like this:
Connecting with ssh works well with key authentication:# dsh -n server date server.domain.com: Host key verification failed. dsh: 2617-009 server.domain.com remote shell had exit code 255
# ssh serverThe difference between the two connections is that the dsh uses the FQDN, and the FQDN needs to be added to the known_hosts file for SSH. Therefore you must make an ssh connection first with FQDN to the host:
Now try to use dsh again, and you'll see it will work:# ssh server.domain.com date The authenticity of host server.domain.com can't be established. RSA key fingerprint is 1b:b1:89:c0:63:d5:f1:f1:41:fa:38:14:d8:60:ce. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added server.domain.com (RSA) to the list of known hosts. Tue Sep 6 11:56:34 EDT 2011
# dsh -n server date server.domain.com: Tue Sep 6 11:56:38 EDT 2011
Sometimes, you just need that one single file from a mksysb image backup. It's really not that difficult to accomplish this.
First of all, go to the directory that contains the mksysb image file:
# cd /sysadm/iosbackupIn this example, were using the mksysb image of a Virtual I/O server, created using iosbackup. This is basically the same as a mksysb image from a regular AIX system. The image file for this mksysb backup is called vio1.mksysb
First, try to locate the file you're looking for; For example, if you're looking for file nimbck.ksh:
Here you can see the original file was located in /home/padmin.# restore -T -q -l -f vio1.mksysb | grep nimbck.ksh New volume on vio1.mksysb: Cluster size is 51200 bytes (100 blocks). The volume number is 1. The backup date is: Thu Jun 9 23:00:28 MST 2011 Files are backed up by name. The user is padmin. -rwxr-xr-x- 10 staff May 23 08:37 1801 ./home/padmin/nimbck.ksh
Now recover that one single file:
Note that it is important to add the dot before the filename that needs to be recovered. Otherwise it won't work. Your file is now restored to ./home/padmin/nimbck.ksh, which is a relative folder from the current directory you're in right now:# restore -x -q -f vio1.mksysb ./home/padmin/nimbck.ksh x ./home/padmin/nimbck.ksh
# cd ./home/padmin # ls -als nimbck.ksh 4 -rwxr-xr-x 1 10 staff 1801 May 23 08:37 nimbck.ksh


