[torqueusers] how -l procs works

Glen Beane glen.beane at gmail.com
Wed Jun 2 09:55:42 MDT 2010


On Wed, Jun 2, 2010 at 11:48 AM, Ken Nielson
<knielson at adaptivecomputing.com> wrote:
> On 06/02/2010 09:40 AM, Glen Beane wrote:
>> On Wed, Jun 2, 2010 at 11:33 AM, Glen Beane<glen.beane at gmail.com>  wrote:
>>
>>> On Wed, Jun 2, 2010 at 11:04 AM, Ken Nielson
>>> <knielson at adaptivecomputing.com>  wrote:
>>>
>>>> Hi all,
>>>>
>>>> On another thread with the subject "qsub on several nodes" it was
>>>> suggested the procs is a better solution to scattering jobs across all
>>>> available processors than nodes.  However, I find the procs resource
>>>> does not seem to behave the way  described in the thread.
>>>>
>>>> For instance if I do the following:
>>>>
>>>> qsub -l procs=5<job.sh>
>>>>
>>>> The qstat output shows the following resource list
>>>>
>>>>   Resource_List.neednodes = 1
>>>>   Resource_List.nodect = 1
>>>>   Resource_List.nodes = 1
>>>>   Resource_List.procs = 5
>>>>
>>>> If I do a qrun on this job it will be assigned a single node and one
>>>> processor.
>>>>
>>>> The qstat -f after the job is started gives and exec_host of node/0.
>>>> TORQUE ignores the procs keyword and assigns the default of 1 node and
>>>> one processor to the job.
>>>>
>>>> Moab interprets procs to mean number of processors requested on a single
>>>> node for the job. If I let Moab Schedule the job the exec_host from
>>>> qstat is node/0+node/1+node/2+node/3+node/4.
>>>>
>>>> If I make the value of procs greater than the number of processors on
>>>> any node moab will not run the job.
>>>>
>>>> Ken
>>>>
>>> as far as I know, moab looks at -l proc=X, interprets it, and then
>>> sets the exec_host to some set of nodes that satisfies the request.
>>> It is a hack and defiantly won't work with qrun, since it requires
>>> that the exec_host list is set. The fact that it is basically ignored
>>> by torque is my major complaint with how it is implemented.
>>>
>>> I use it all the time to request more processors than will run on a
>>> single node.  For example, I routinely use -l procs=32 or more on a
>>> cluster of 4 core nodes.  I'm using Moab 5.4.0, but I know I've used
>>> it on some recent previous versions.
>>>
>>
>> gbeane at wulfgar:~>  echo "pbsdsh hostname" | qsub -N procs_test -l
>> procs=64,walltime=00:01:00
>> 69641.wulfgar.jax.org
>> gbeane at wulfgar:~>  cat procs_test.o69641 | sort | uniq | wc -l
>> 17
>>
>> it took 17 unique nodes to satisfy my procs=64 request.  Some nodes I
>> was allocated 4 cores, others I was allocated some subset of the total
>> number of cores because others were in use.
>>
>> another example
>>
>> gbeane at wulfgar:~>  qsub -l procs=64,walltime=00:05:00 -I
>> qsub: waiting for job 69642.wulfgar.jax.org to start
>> qsub: job 69642.wulfgar.jax.org ready
>>
>> Have a lot of fun...
>> Directory: /home/gbeane
>> Wed Jun  2 11:38:05 EDT 2010
>> gbeane at cs-short-2:~>  cat $PBS_NODEFILE | wc -l
>> 64
>> gbeane at cs-short-2:~>  cat $PBS_NODEFILE | uniq | wc -l
>> 17
>> _______________________________________________
>>
> So, is the idea to let Moab create the node spec and then use qrun to
> execute the job?

I don't think you'll get it to work with qrun...

Moab tells pbs_server to run the job on a set of nodes. While
pbs_server thinks the job only wants 1 node since it has no clue what
procs=N means, it does what Moab tells it to do so the job actually
runs on the number of processors the user requested with procs.


I'm not sure why you are having difficulties with procs=N when N >
number of cores on a node.


More information about the torqueusers mailing list