[torqueusers] torque does not kill jobs when wall_time or cpu_time reached

Glen Beane glen.beane at gmail.com
Tue Jun 8 17:50:56 MDT 2010





On Jun 8, 2010, at 6:37 PM, David Singleton  
<David.Singleton at anu.edu.au> wrote:

> On 06/09/2010 07:29 AM, Glen Beane wrote:
>>
>> in my opinion JOBNODEMATCHPOLICY  EXACTNODE should now be the default
>> behavior since we have -l procs.  If I ask for 5 node and 8  
>> processors
>> per node then that is what I should get. I don't want 10 nodes with 4
>> processors or 2 nodes with 16 processors and 1 with 8, etc.  If  
>> people
>> don't care about the layout of their job they can use -l procs.
>> hopefully with select things will be less ambiguous and will allow  
>> for
>> greater flexibility (let the user be precise as they want, but also
>> allow some way to say I don't care, just give me X processors).
>
> Our experience is that very few users want detailed control over  
> exactly
> how many physical nodes they get - it seems to be only comp sci  
> students
> or similar with mistaken ideas about the value of such control.  They
> dont seem to realise that when they demand 1 cpu from each of 16  
> nodes,
> variability in what is running on the other cpus on those nodes will  
> make
> a mockery of any performance numbers they deduce.  Other reasons for
> requesting exact nodes are usually to do with another resource  
> (memory,
> network interfaces, GPUs, ...).  It should be requests for those  
> resources/
> node properties that get what the user wants, not the number of nodes.
>
> We certainly have more users with hydrid MPI-OpenMP codes and for  
> them,
> nodes are really "virtual nodes", eg. a request for -lnodes=8:ppn=4  
> means
> the job will be running with 8 MPI tasks each of which will have 4  
> threads -
> the job needs any (the best?) set of cpus that can run that.  A 32P  
> SMP
> might a perfectly acceptable solution.
>
> I suspect hybrid codes will become more common.
>
> So I would suggest EXACTNODE should not be the default but rather that
> users thinking they want such detailed control should have to  
> specify some
> other option to show this (eg. -lother=exactnodes), ie. nodes are
> "virtual nodes" unless the user specifies otherwise.
>
>>
>> Also, the documentation should be clear that when you request a  
>> number
>> of processors per node (ppn) or a number of processors (procs) it is
>> talking about virtual processors as configured in pbs_server
>
> True.
>
> Note that virtual processors != physical processors causes a number of
> problems.

Generally virtual processors is set to the number of CPUs * cores per  
CPU. Some (but not all) users know each processor has multiple core  
and think that by requesting the processor they are allocated all  
cores of the processor. 
  


More information about the torqueusers mailing list