[torqueusers] Bug in Torque 2.3.7 with Maui 3.2.6p21: Truncated node resources

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Thu Jun 4 07:22:38 MDT 2009

We're upgrading our cluster from CentOS4 to CentOS5 and would like to
upgrade Torque/Maui as well.  We're having a problem with Torque 2.3.7
and Maui 3.2.6p21 that's a real show-stopper: Node resources from
Torque get truncated by Maui so that only the first resource is used.
Obviously, this makes our test cluster rather useless at this time.
We do not know whether the bug is a Torque or a Maui issue.

Some examples from the Maui logfile illustrate the problem when
we submit jobs that request 2 nodes with ppn=4 on each node.

1) Job submitted with "qsub -l nodes=2:ppn=4" is allocated only 1 node:
06/04 14:18:39 MRMJobStart(37,Msg,SC)
06/04 14:18:39 MPBSJobStart(37,0,Msg,SC)
06/04 14:18:39 MPBSJobModify(37,Resource_List,Resource,m038:ppn=4)
06/04 14:18:39 MPBSJobModify(37,Resource_List,Resource,2:ppn=4)

2) Job submitted with "qsub -l nodes=m040:ppn=4+m035:ppn=4" is
allocated only node m040:
06/04 14:21:47 MRMJobStart(41,Msg,SC)
06/04 14:21:47 MPBSJobStart(41,0,Msg,SC)
06/04 14:21:47 MPBSJobModify(41,Resource_List,Resource,m035:ppn=4)
06/04 14:21:47 MPBSJobModify(41,Resource_List,Resource,m040:ppn=4+m035:ppn=4)

3) Job submitted with "qsub -l nodes=m035:ppn=4+m040:ppn=4" is
allocated only node m035:
06/04 14:23:39 MRMJobStart(42,Msg,SC)
06/04 14:23:39 MPBSJobStart(42,0,Msg,SC)
06/04 14:23:39 MPBSJobModify(42,Resource_List,Resource,m040:ppn=4)
06/04 14:23:39 MPBSJobModify(42,Resource_List,Resource,m035:ppn=4+m040:ppn=4)

My guess is that Torque hands a Resource_List to Maui, but Maui
ignores all elements in the list but the first one.

We have the following versions: Latest Torque 2.3.7-snap.200906020815.
We abandoned Torque 2.3.6 because the pbs_server crashed in early tests.
Maui version 3.2.6p21 is used, but the same problem is seen with
the latest snapshot maui-3.2.6p21-snap.1243977349.

We're asking for recommendations for how to proceed with this bug ?
Our old cluster runs Torque 2.1.11 with Maui 3.2.6p21 and works really,
really great !  So is it premature to upgrade to the Torque 2.3 series ?


Ole Holm Nielsen
Department of Physics, Technical University of Denmark

