[torqueusers] torque does not kill jobs when wall_time or cpu_time reached

David Singleton David.Singleton at anu.edu.au
Mon Jun 7 17:34:54 MDT 2010


On 06/08/2010 01:50 AM, Ken Nielson wrote:
> On 06/07/2010 09:29 AM, Glen Beane wrote:
>> On Mon, Jun 7, 2010 at 11:21 AM, Ken Nielson
>> <knielson at adaptivecomputing.com>   wrote:
>>
>>> On 06/07/2010 09:10 AM, Glen Beane wrote:
>>>
>>>> On Mon, Jun 7, 2010 at 11:02 AM, Ken Nielson
>>>> <knielson at adaptivecomputing.com>     wrote:
>>>>
>>>>
>>>>> On 06/04/2010 08:14 PM, Glen Beane wrote:
>>>>>
>>>>>
>>>>>> On Fri, Jun 4, 2010 at 5:37 PM, David Singleton
>>>>>> <David.Singleton at anu.edu.au>       wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> If procs is going to mean processors/cpus then I would suggest there needs
>>>>>>> to be a lot of code added to align nodes and procs - they are specifying
>>>>>>> the same thing.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Moab treats them the same if you do not specify ppn with your nodes
>>>>>> request, however TORQUE is pretty much unaware of what -l procs=X
>>>>>> means - it just passes the info along to Moab. I would like to see
>>>>>> procs become a real torque resource that means give me X total
>>>>>> processors on anywhere from 1 to X nodes.
>>>>>>
>>>>>>
>>>>>>
>>>>> Currently Moab interprets procs to mean give me all the processors on X
>>>>> nodes.
>>>>>
>>>>>
>>>> that doesn't seem correct.  I use procs all the time and I do not get
>>>> this behavior from Moab (I've tried it with 5.3 and 5.4).  The
>>>> behavior I expect and see is for Moab to give me X total processors
>>>> spread across any number of nodes (the processors could all be on the
>>>> same node, or they could be spread across many nodes depending on what
>>>> is free at the time the job is scheduled to run).
>>>>
>>>>
>>> Glen
>>>
>>> Try doing a qsub -l proces=1<job.sh>. Then do a qstat -f and see what
>>> the exec_host is set to.
>>>
>>> I am running Moab 5.4.
>>>
>>>
>> you must have some TORQUE defaults set, like ncpus that are
>> interfering with procs.  Since -l procs does not set ncpus, your
>> default is getting applied.
>>
>> gbeane at wulfgar:~>   echo "sleep 60" | qsub -l procs=1,walltime=00:01:00
>> 69760.wulfgar.jax.org
>> qstat -f 69760
>> ...
>> exec_host = cs-short-2/0
>> ...
>>
> Glen,
>
> You are right. I set those on my last set of problems with syntax.
> Ironically they did not affect those resources.
>
> Ken


I rest my case.


We treat ncpus as moab appears to treat procs.  But the server also
aligns ncpus and nodes requests, eg.

vayu2:~ > qsub -lncpus=4 -h
w
194363.vu-pbs
vayu2:~ > qstat -f 194363
Job Id: 194363.vu-pbs
     ...
     Resource_List.ncpus = 4
     Resource_List.neednodes = 4:ppn=1
     Resource_List.nodect = 4
     Resource_List.nodes = 4:ppn=1
     ...

vayu2:~ > qsub -lnodes=1:ppn=4 -h
w
194365.vu-pbs
vayu2:~ > qstat -f 194365
Job Id: 194365.vu-pbs
     ...
     Resource_List.ncpus = 4
     Resource_List.neednodes = 1:ppn=4
     Resource_List.nodect = 1
     Resource_List.nodes = 1:ppn=4
     ...

Any resource limits or defaults really apply to both ncpus (procs) and
nodes.

David




More information about the torqueusers mailing list