[torqueusers] Torque 1.1.0p3
scott at supercluster.org
scott at supercluster.org
Mon Oct 18 12:19:16 MDT 2004
Torque Users,
Torque 1.1.0 patch 3 is now available. It contains two major improvements and
many minor enhancements.
#1 momctl
The momctl command allows diagnosis and management of the mom daemon and uses
the following syntax:
USAGE: momctl <ARGS>
[ -c [JOB] ] // CLEAR STALE JOB
[ -d DIAGLEVEL ] // DIAGNOSE (0 - 3)
[ -f HOSTFILE ] // HOSTFILE
[ -h [HOST] ] // HOST
[ -p [PORT] ] // PORT
[ -q [ATTR] ] // QUERY STATE
[ -r [FILE] ] // RECONFIG
[ -s ] // SHUTDOWN
If diagnostics are requested, the following information is reported:
-----
Host: hana/hana.icluster.org Server: 10.10.10.106
HomeDirectory: /usr/spool/PBS/mom_priv
MOM active: 240 seconds
Last Msg From Server: 240 seconds (CLUSTER_ADDRS)
Last Msg To Server: 6 seconds
Init Msgs Received: 0 hellos/1 cluster-addrs
Init Msgs Sent: 1 hellos
LOGLEVEL: 0 (use SIGUSR1/SIGUSR2 to adjust)
Trusted Client List:
10.10.10.110,10.10.10.121,10.10.10.119,10.10.10.117,10.10.10.106,127.0.0.1
JobList: NONE
-----
In addition, a number of internal checks are performed and failures reported.
If there is additional information which sites may find of value in
diagnosing local failures, please let us know.
# 2 Modification of the pbs_mom to pbs_server daemon initialization sequence
Using momctl, we finally were able to isolate and correct an issue which has
been plagueing a number of sites. There was an issue with the way moms
initialized communication with the pbs_server resulting in 'unexpected eof' and
'cannot connect' messages. This change should not only remove these failures,
but it should also accelerate pbs recycle times allowing all nodes to come on
line faster.
Please test patch 3 and let us know of any issues you see. Thanks for all of
the contributed patches.
Supercluster Development Group
More information about the torqueusers
mailing list