[torqueusers] how -l procs works
Ken Nielson
knielson at adaptivecomputing.com
Wed Jun 2 09:48:57 MDT 2010
On 06/02/2010 09:40 AM, Glen Beane wrote:
> On Wed, Jun 2, 2010 at 11:33 AM, Glen Beane<glen.beane at gmail.com> wrote:
>
>> On Wed, Jun 2, 2010 at 11:04 AM, Ken Nielson
>> <knielson at adaptivecomputing.com> wrote:
>>
>>> Hi all,
>>>
>>> On another thread with the subject "qsub on several nodes" it was
>>> suggested the procs is a better solution to scattering jobs across all
>>> available processors than nodes. However, I find the procs resource
>>> does not seem to behave the way described in the thread.
>>>
>>> For instance if I do the following:
>>>
>>> qsub -l procs=5<job.sh>
>>>
>>> The qstat output shows the following resource list
>>>
>>> Resource_List.neednodes = 1
>>> Resource_List.nodect = 1
>>> Resource_List.nodes = 1
>>> Resource_List.procs = 5
>>>
>>> If I do a qrun on this job it will be assigned a single node and one
>>> processor.
>>>
>>> The qstat -f after the job is started gives and exec_host of node/0.
>>> TORQUE ignores the procs keyword and assigns the default of 1 node and
>>> one processor to the job.
>>>
>>> Moab interprets procs to mean number of processors requested on a single
>>> node for the job. If I let Moab Schedule the job the exec_host from
>>> qstat is node/0+node/1+node/2+node/3+node/4.
>>>
>>> If I make the value of procs greater than the number of processors on
>>> any node moab will not run the job.
>>>
>>> Ken
>>>
>> as far as I know, moab looks at -l proc=X, interprets it, and then
>> sets the exec_host to some set of nodes that satisfies the request.
>> It is a hack and defiantly won't work with qrun, since it requires
>> that the exec_host list is set. The fact that it is basically ignored
>> by torque is my major complaint with how it is implemented.
>>
>> I use it all the time to request more processors than will run on a
>> single node. For example, I routinely use -l procs=32 or more on a
>> cluster of 4 core nodes. I'm using Moab 5.4.0, but I know I've used
>> it on some recent previous versions.
>>
>
> gbeane at wulfgar:~> echo "pbsdsh hostname" | qsub -N procs_test -l
> procs=64,walltime=00:01:00
> 69641.wulfgar.jax.org
> gbeane at wulfgar:~> cat procs_test.o69641 | sort | uniq | wc -l
> 17
>
> it took 17 unique nodes to satisfy my procs=64 request. Some nodes I
> was allocated 4 cores, others I was allocated some subset of the total
> number of cores because others were in use.
>
> another example
>
> gbeane at wulfgar:~> qsub -l procs=64,walltime=00:05:00 -I
> qsub: waiting for job 69642.wulfgar.jax.org to start
> qsub: job 69642.wulfgar.jax.org ready
>
> Have a lot of fun...
> Directory: /home/gbeane
> Wed Jun 2 11:38:05 EDT 2010
> gbeane at cs-short-2:~> cat $PBS_NODEFILE | wc -l
> 64
> gbeane at cs-short-2:~> cat $PBS_NODEFILE | uniq | wc -l
> 17
> _______________________________________________
>
So, is the idea to let Moab create the node spec and then use qrun to
execute the job?
Ken
More information about the torqueusers
mailing list