[torqueusers] Re: Associating a queue with specific nodes
Anne Hammond
hammond at txcorp.com
Fri Jan 5 17:12:02 MST 2007
Thanks marc,
How do I associate a node in the nodes file
(/var/spool/torque/server_priv/nodes) ??
pbs is still not allocating the nodes correctly.
If, in our submission job, we have:
#PBS -q s3opt24
#PBS -l nodes=5:ppn=2
(Shere nodes23-46 are associated via torque qmgr to queue
s3opt24) the job runs on nodes 16-20:
2306.storage2.cl.txc swsides s3opt24 a005135032 -- 5 -- -- -- R
--
node20/1+node20/0+node19/1+node19/0+node18/1+node18/0+node17/1+node17/0
+node16/1+node16/0
This is not what is supposed to happen.
If we use this submission file:
#PBS -l nodes=node23:ppn=2+node24:ppn=2+node25:ppn=2+node26:ppn=2+node27:ppn=2
the job runs correctly on nodes 23-27.
Can anything be done to correct this?
Anne M. Hammond - Systems / Network Administration - Tech-X Corp
hammond_at_txcorp.com 720-974-1840
On Fri, 5 Jan 2007, marc wrote:
> Hello everyone,
> If you have associated a queue to specific nodes in the "nodes" file then
> you
> should be able to use them specifically as well. However, it is not enough to
> use the "#PBS -q name_of_the_queue" because this will only take the
> characteristics of the queue (runtime, disk,mem, etc). However if you define
> "#PBS -l nodes=name_of_the_queue" the destination nodes will be forced to
> thos
> e assoaciated to this queue in the nodes file.
> I am not sure about a default definition for each queue about which nodes
> using. I am not taling about the nodes file, wheren the queue defined for
> each
> node is another attribute for that node. I am talking about a definition
> within the queue attributes.
> Hope it helps
> Marc
> On Thu, 04 Jan 2007 19:05:21 -0700 (MST), Anne Hammond wrote
>> Associating queue with specific nodes is definitely not
>> working as documented.
>>
>> queue s3opt24 has nodes23-46 allocated to it. If a job is
>> submitted to the queue requesting 5 nodes, these
>> are the nodes allocated to the job:
>>
>> exec_host = node24/1+node24/0+node23/1+node23/0+node21/1+node21/0+node20/1
>> +node20/0+node19/1+node19/0
>>
>> This is the output of the job:
>>
>> Warning: no access to tty (Bad file descriptor).
>> Thus no job control in this shell.
>> The master node of this job is node24.cl.xxxcorp.com
>> The working directory is
>> /home/research/swsides/z3d100x100_ab_da000_db000_filter
>> _2
>> This job runs on the following nodes:
>> The nodefile is /var/spool/torque/aux//2272.storage2.cl.xxxcorp.com
>> node24 node24 node23 node23 node21 node21 node20 node20 node19 node19
>> This job has allocated 10 nodes
>> --------------------------------------
>> Running torque epilogue script
>>
>> Cleaning node node19
>> Cleaning node node19
>> Cleaning node node20
>> Cleaning node node20
>> Cleaning node node21
>> Cleaning node node21
>> Cleaning node node23
>> Cleaning node node23
>> Cleaning node node24
>> Cleaning node node24
>> Done
>>
>> Thank you for pointers.
>>
>> Anne M. Hammond - Systems / Network Administration - Tech-X Corp
>> hammond_at_txcorp.com 720-974-1840
>>
>> On Thu, 4 Jan 2007, Anne Hammond wrote:
>>
>>> On closer inspection of /var/spool/torque/server_logs/20070103,
>>> I see that the job was assigned to node39, not node22:
>>>
>>> 01/03/2007 23:42:47;0100;PBS_Server;Req;;Type JobObituary request received
>>> from pbs_mom at node39.cl.xxxcorp.com, sock=11
>>>
>>> My apologies for an earlier post in error. We will test with a
>>> different known working script and report whether the job is
>>> submitted to the correct node list.
>>>
>>> Anne
>>>
>>> Anne M. Hammond - Systems / Network Administration - Tech-X Corp
>>> hammond_at_txcorp.com 720-974-1840
>>>
>>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
> ------------------------------------------------------
> Marc Noguera i Julian
> Tècnic de suport a la recerca
> Despatx C7-149. Edifici Cn.
> Campus UAB. Bellaterra
> 08193. Barcelona
> email: marc_at_klingon.uab.es
> web: http://klingon.uab.es/marc
> Tlf/Phone: 00 34 935812173
> -------------------------------------------------------
More information about the torqueusers
mailing list