[torqueusers] Problem with nodes allocation

Roger Williams R.Williams at gns.cri.nz
Tue Jul 1 17:30:33 MDT 2008


In recent months on this list there have been a couple of reports of 
people getting only one node allocated regardless of the number of nodes 
requested (with #PBS -l nodes=x). There was a report from Michael Marti in 
March and a tantalising mail from Nicola Guida in June.

Can any of the Torque developers say if the cause of this problem was 
definitively identified and, if so, has it been fixed?

I see the same thing. I have a new(ish) Altix XE cluster (SLES 10) and 
Torque 2.2.0. The cluster has 1 head node and 20 compute nodes with 
pbs_server and pbs_sched on the head node and pbs_mom on the compute nodes 
(and not on the head node). Dual internal gigabit ethernet (first for NFS, 
second for Torque and MPI). Cluster interconnect (passwordless ssh and 
rsh) is fine in all combinations.

Simple scripts with, eg

  #PBS -l nodes=5

produce a $PBS_NODEFILE with just the first of the 20 nodes.

However, if I specify nodes explicitly by name, eg

  #PBS -l nodes=cl1n010-gige+cl1n011-gige

then the nodes list is correct.

In the first example, I also see the same thing that Michael reported with 
the output of 'qstat -n'. That is, the NDS column has the correct count 
(matching nodes=) but only the first node is listed. I also concur with 
his observation that "The ppn part of the node request however seems to work ok".

Can anyone say what is going on?

-- 
Roger Williams, GNS Science, New Zealand : www.gns.cri.nz : xyzzy



More information about the torqueusers mailing list