[torqueusers] Problem with nodes allocation

Glen Beane glen.beane at gmail.com
Thu Jul 3 05:05:18 MDT 2008


On Thu, Jul 3, 2008 at 6:56 AM, Glen Beane <glen.beane at gmail.com> wrote:

>
>
> On Thu, Jul 3, 2008 at 4:50 AM, Roger Williams <R.Williams at gns.cri.nz>
> wrote:
>
>> A reader sent me this mail reply, for which I'm grateful. Not sure if it
>> made it to the list ...
>>
>> >I find the problem here is the confusion between "nodes" and "cpus". I
>> >understand that nodes=5 means you want 5 CPUs not 5 nodes! I always
>> >recommend using ":ppn=X", ie -l nodes=5:ppn=2 to get 5 x 2CPU nodes.
>> >
>> >The $PBS_NODEFILE lists the node name of each CPU assigned, which
>> >would mean that in your case the $PBS_NODEFILE would have the first node
>> >listed five times.
>>
>> No. I see only *one* line in  $PBS_NODEFILE in the case that I (and the
>> others) have experienced. That is the symptom.
>>
>> I have also tried the -l nodes=x:ppn=y syntax (I have 8 cores in each
>> node). In those trials I get y lines of the first node only.
>>
>> Again, can any of the Torque developers say if the cause of this problem
>> was definitively identified and, if so, has it been fixed?
>>
>
>
> I seem to remember this bug being introduced in 2.2.0 or thereabouts and
> fixed in a later release.
> I would stick with TORQUE 2.1.10 or 2.3.1


OK,  I found reports of this in some 2.1.x versions of TORQUE as well, and I
can't find any info that it was actually fixed yet.

What scheduler are you using?  It sounds like a bug that could be in
pbs_sched.  Would it be possible for you to try Maui?  Can you send me your
pbs_server config?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080703/08486dd4/attachment-0001.html


More information about the torqueusers mailing list