[torqueusers] torque does not kill jobs when wall_time or cpu_time reached
David Singleton
David.Singleton at anu.edu.au
Mon Jun 7 17:34:54 MDT 2010
On 06/08/2010 01:50 AM, Ken Nielson wrote:
> On 06/07/2010 09:29 AM, Glen Beane wrote:
>> On Mon, Jun 7, 2010 at 11:21 AM, Ken Nielson
>> <knielson at adaptivecomputing.com> wrote:
>>
>>> On 06/07/2010 09:10 AM, Glen Beane wrote:
>>>
>>>> On Mon, Jun 7, 2010 at 11:02 AM, Ken Nielson
>>>> <knielson at adaptivecomputing.com> wrote:
>>>>
>>>>
>>>>> On 06/04/2010 08:14 PM, Glen Beane wrote:
>>>>>
>>>>>
>>>>>> On Fri, Jun 4, 2010 at 5:37 PM, David Singleton
>>>>>> <David.Singleton at anu.edu.au> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> If procs is going to mean processors/cpus then I would suggest there needs
>>>>>>> to be a lot of code added to align nodes and procs - they are specifying
>>>>>>> the same thing.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Moab treats them the same if you do not specify ppn with your nodes
>>>>>> request, however TORQUE is pretty much unaware of what -l procs=X
>>>>>> means - it just passes the info along to Moab. I would like to see
>>>>>> procs become a real torque resource that means give me X total
>>>>>> processors on anywhere from 1 to X nodes.
>>>>>>
>>>>>>
>>>>>>
>>>>> Currently Moab interprets procs to mean give me all the processors on X
>>>>> nodes.
>>>>>
>>>>>
>>>> that doesn't seem correct. I use procs all the time and I do not get
>>>> this behavior from Moab (I've tried it with 5.3 and 5.4). The
>>>> behavior I expect and see is for Moab to give me X total processors
>>>> spread across any number of nodes (the processors could all be on the
>>>> same node, or they could be spread across many nodes depending on what
>>>> is free at the time the job is scheduled to run).
>>>>
>>>>
>>> Glen
>>>
>>> Try doing a qsub -l proces=1<job.sh>. Then do a qstat -f and see what
>>> the exec_host is set to.
>>>
>>> I am running Moab 5.4.
>>>
>>>
>> you must have some TORQUE defaults set, like ncpus that are
>> interfering with procs. Since -l procs does not set ncpus, your
>> default is getting applied.
>>
>> gbeane at wulfgar:~> echo "sleep 60" | qsub -l procs=1,walltime=00:01:00
>> 69760.wulfgar.jax.org
>> qstat -f 69760
>> ...
>> exec_host = cs-short-2/0
>> ...
>>
> Glen,
>
> You are right. I set those on my last set of problems with syntax.
> Ironically they did not affect those resources.
>
> Ken
I rest my case.
We treat ncpus as moab appears to treat procs. But the server also
aligns ncpus and nodes requests, eg.
vayu2:~ > qsub -lncpus=4 -h
w
194363.vu-pbs
vayu2:~ > qstat -f 194363
Job Id: 194363.vu-pbs
...
Resource_List.ncpus = 4
Resource_List.neednodes = 4:ppn=1
Resource_List.nodect = 4
Resource_List.nodes = 4:ppn=1
...
vayu2:~ > qsub -lnodes=1:ppn=4 -h
w
194365.vu-pbs
vayu2:~ > qstat -f 194365
Job Id: 194365.vu-pbs
...
Resource_List.ncpus = 4
Resource_List.neednodes = 1:ppn=4
Resource_List.nodect = 1
Resource_List.nodes = 1:ppn=4
...
Any resource limits or defaults really apply to both ncpus (procs) and
nodes.
David
More information about the torqueusers
mailing list