[torqueusers] how -l procs works

Ken Nielson knielson at adaptivecomputing.com
Wed Jun 2 10:04:00 MDT 2010


On 06/02/2010 09:55 AM, Glen Beane wrote:
> On Wed, Jun 2, 2010 at 11:48 AM, Ken Nielson
> <knielson at adaptivecomputing.com>  wrote:
>    
>> On 06/02/2010 09:40 AM, Glen Beane wrote:
>>      
>>> On Wed, Jun 2, 2010 at 11:33 AM, Glen Beane<glen.beane at gmail.com>    wrote:
>>>
>>>        
>>>> On Wed, Jun 2, 2010 at 11:04 AM, Ken Nielson
>>>> <knielson at adaptivecomputing.com>    wrote:
>>>>
>>>>          
>>>>> Hi all,
>>>>>
>>>>> On another thread with the subject "qsub on several nodes" it was
>>>>> suggested the procs is a better solution to scattering jobs across all
>>>>> available processors than nodes.  However, I find the procs resource
>>>>> does not seem to behave the way  described in the thread.
>>>>>
>>>>> For instance if I do the following:
>>>>>
>>>>> qsub -l procs=5<job.sh>
>>>>>
>>>>> The qstat output shows the following resource list
>>>>>
>>>>>    Resource_List.neednodes = 1
>>>>>    Resource_List.nodect = 1
>>>>>    Resource_List.nodes = 1
>>>>>    Resource_List.procs = 5
>>>>>
>>>>> If I do a qrun on this job it will be assigned a single node and one
>>>>> processor.
>>>>>
>>>>> The qstat -f after the job is started gives and exec_host of node/0.
>>>>> TORQUE ignores the procs keyword and assigns the default of 1 node and
>>>>> one processor to the job.
>>>>>
>>>>> Moab interprets procs to mean number of processors requested on a single
>>>>> node for the job. If I let Moab Schedule the job the exec_host from
>>>>> qstat is node/0+node/1+node/2+node/3+node/4.
>>>>>
>>>>> If I make the value of procs greater than the number of processors on
>>>>> any node moab will not run the job.
>>>>>
>>>>> Ken
>>>>>
>>>>>            
>>>> as far as I know, moab looks at -l proc=X, interprets it, and then
>>>> sets the exec_host to some set of nodes that satisfies the request.
>>>> It is a hack and defiantly won't work with qrun, since it requires
>>>> that the exec_host list is set. The fact that it is basically ignored
>>>> by torque is my major complaint with how it is implemented.
>>>>
>>>> I use it all the time to request more processors than will run on a
>>>> single node.  For example, I routinely use -l procs=32 or more on a
>>>> cluster of 4 core nodes.  I'm using Moab 5.4.0, but I know I've used
>>>> it on some recent previous versions.
>>>>
>>>>          
>>> gbeane at wulfgar:~>    echo "pbsdsh hostname" | qsub -N procs_test -l
>>> procs=64,walltime=00:01:00
>>> 69641.wulfgar.jax.org
>>> gbeane at wulfgar:~>    cat procs_test.o69641 | sort | uniq | wc -l
>>> 17
>>>
>>> it took 17 unique nodes to satisfy my procs=64 request.  Some nodes I
>>> was allocated 4 cores, others I was allocated some subset of the total
>>> number of cores because others were in use.
>>>
>>> another example
>>>
>>> gbeane at wulfgar:~>    qsub -l procs=64,walltime=00:05:00 -I
>>> qsub: waiting for job 69642.wulfgar.jax.org to start
>>> qsub: job 69642.wulfgar.jax.org ready
>>>
>>> Have a lot of fun...
>>> Directory: /home/gbeane
>>> Wed Jun  2 11:38:05 EDT 2010
>>> gbeane at cs-short-2:~>    cat $PBS_NODEFILE | wc -l
>>> 64
>>> gbeane at cs-short-2:~>    cat $PBS_NODEFILE | uniq | wc -l
>>> 17
>>> _______________________________________________
>>>
>>>        
>> So, is the idea to let Moab create the node spec and then use qrun to
>> execute the job?
>>      
> I don't think you'll get it to work with qrun...
>
> Moab tells pbs_server to run the job on a set of nodes. While
> pbs_server thinks the job only wants 1 node since it has no clue what
> procs=N means, it does what Moab tells it to do so the job actually
> runs on the number of processors the user requested with procs.
>
>
> I'm not sure why you are having difficulties with procs=N when N>
> number of cores on a node.
> _______________________________________________
>    
Well, I just restarted my set up and sure enough Moab does run the jobs 
across as many nodes a necessary.

Nevermind.

Ken


More information about the torqueusers mailing list