[Mauiusers] Bug in "-l file=XXX" options?
T. Daniel Crawford
crawdad at exchange.vt.edu
Fri Jul 1 10:11:17 MDT 2005
We recently installed Torque (1.2.0p4) and Maui (3.2.6p13) on our research
group's clusters of Athlons, Xeons, and Opterons (all running FC2 or FC3).
The system has worked *great* so far, except for the apparent failure of the
file=XXX option. Specifically, if a user give, e.g., "-l file=140000mb" to
qsub, then Maui appears to select the correct subset of nodes, i.e., the
task will go only to a machine with sufficient scratch space as reported to
the pbs_server by the node's pbs_mom. However, the job immediately dies
07/01/2005 12:04:34;0001; pbs_mom;Job;TMomFinalizeJob3;job not started,
Failure job exec failure, after files staged, no retry
07/01/2005 12:04:34;0001; pbs_mom;Job;456.sirius.<censored>;ALERT: job
failed phase 3 start, server will retry
07/01/2005 12:04:34;0008; pbs_mom;Req;send_sisters;sending ABORT to
However, if I only request "-l file=10mb", the job runs fine. (But "-l
file=100mb" also fails.)
Many of our calculations require large amounts of scratch disk space. I'd
prefer to use the MINRESOURCE policy only because of its dynamic
flexibility, but this bug has forced me to define partitions of nodes, which
doesn't always provide the most balanced load across the cluster.
Any help the Maui/Torque gurus can provide would be greatly appreciated!
T. Daniel Crawford Department of Chemistry
crawdad at vt.edu Virginia Tech
www.chem.vt.edu/faculty/crawford.php Voice: 540-231-7760 FAX: 540-231-3255
PGP Public Key at: http://www.chem.vt.edu/chem-dept/crawford/publickey.txt
More information about the mauiusers