[torqueusers] torque does not kill jobs when wall_time or cpu_time reached

Glen Beane glen.beane at gmail.com
Mon Jun 7 09:29:45 MDT 2010


On Mon, Jun 7, 2010 at 11:21 AM, Ken Nielson
<knielson at adaptivecomputing.com> wrote:
> On 06/07/2010 09:10 AM, Glen Beane wrote:
>> On Mon, Jun 7, 2010 at 11:02 AM, Ken Nielson
>> <knielson at adaptivecomputing.com>  wrote:
>>
>>> On 06/04/2010 08:14 PM, Glen Beane wrote:
>>>
>>>> On Fri, Jun 4, 2010 at 5:37 PM, David Singleton
>>>> <David.Singleton at anu.edu.au>    wrote:
>>>>
>>>>
>>>>
>>>>> If procs is going to mean processors/cpus then I would suggest there needs
>>>>> to be a lot of code added to align nodes and procs - they are specifying
>>>>> the same thing.
>>>>>
>>>>>
>>>> Moab treats them the same if you do not specify ppn with your nodes
>>>> request, however TORQUE is pretty much unaware of what -l procs=X
>>>> means - it just passes the info along to Moab. I would like to see
>>>> procs become a real torque resource that means give me X total
>>>> processors on anywhere from 1 to X nodes.
>>>> _______________________________________________
>>>> torqueusers mailing list
>>>> torqueusers at supercluster.org
>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>
>>>>
>>> Currently Moab interprets procs to mean give me all the processors on X
>>> nodes.
>>>
>> that doesn't seem correct.  I use procs all the time and I do not get
>> this behavior from Moab (I've tried it with 5.3 and 5.4).  The
>> behavior I expect and see is for Moab to give me X total processors
>> spread across any number of nodes (the processors could all be on the
>> same node, or they could be spread across many nodes depending on what
>> is free at the time the job is scheduled to run).
>> _______________________________________________
>>
> Glen
>
> Try doing a qsub -l proces=1 <job.sh>. Then do a qstat -f and see what
> the exec_host is set to.
>
> I am running Moab 5.4.
>

you must have some TORQUE defaults set, like ncpus that are
interfering with procs.  Since -l procs does not set ncpus, your
default is getting applied.

gbeane at wulfgar:~> echo "sleep 60" | qsub -l procs=1,walltime=00:01:00
69760.wulfgar.jax.org
qstat -f 69760
...
exec_host = cs-short-2/0
...


More information about the torqueusers mailing list