[torqueusers] how -l procs works

Glen Beane glen.beane at gmail.com
Wed Jun 2 09:40:58 MDT 2010


On Wed, Jun 2, 2010 at 11:33 AM, Glen Beane <glen.beane at gmail.com> wrote:
> On Wed, Jun 2, 2010 at 11:04 AM, Ken Nielson
> <knielson at adaptivecomputing.com> wrote:
>> Hi all,
>>
>> On another thread with the subject "qsub on several nodes" it was
>> suggested the procs is a better solution to scattering jobs across all
>> available processors than nodes.  However, I find the procs resource
>> does not seem to behave the way  described in the thread.
>>
>> For instance if I do the following:
>>
>> qsub -l procs=5 <job.sh>
>>
>> The qstat output shows the following resource list
>>
>>  Resource_List.neednodes = 1
>>  Resource_List.nodect = 1
>>  Resource_List.nodes = 1
>>  Resource_List.procs = 5
>>
>> If I do a qrun on this job it will be assigned a single node and one
>> processor.
>>
>> The qstat -f after the job is started gives and exec_host of node/0.
>> TORQUE ignores the procs keyword and assigns the default of 1 node and
>> one processor to the job.
>>
>> Moab interprets procs to mean number of processors requested on a single
>> node for the job. If I let Moab Schedule the job the exec_host from
>> qstat is node/0+node/1+node/2+node/3+node/4.
>>
>> If I make the value of procs greater than the number of processors on
>> any node moab will not run the job.
>>
>> Ken
>
> as far as I know, moab looks at -l proc=X, interprets it, and then
> sets the exec_host to some set of nodes that satisfies the request.
> It is a hack and defiantly won't work with qrun, since it requires
> that the exec_host list is set. The fact that it is basically ignored
> by torque is my major complaint with how it is implemented.
>
> I use it all the time to request more processors than will run on a
> single node.  For example, I routinely use -l procs=32 or more on a
> cluster of 4 core nodes.  I'm using Moab 5.4.0, but I know I've used
> it on some recent previous versions.


gbeane at wulfgar:~> echo "pbsdsh hostname" | qsub -N procs_test -l
procs=64,walltime=00:01:00
69641.wulfgar.jax.org
gbeane at wulfgar:~> cat procs_test.o69641 | sort | uniq | wc -l
17

it took 17 unique nodes to satisfy my procs=64 request.  Some nodes I
was allocated 4 cores, others I was allocated some subset of the total
number of cores because others were in use.

another example

gbeane at wulfgar:~> qsub -l procs=64,walltime=00:05:00 -I
qsub: waiting for job 69642.wulfgar.jax.org to start
qsub: job 69642.wulfgar.jax.org ready

Have a lot of fun...
Directory: /home/gbeane
Wed Jun  2 11:38:05 EDT 2010
gbeane at cs-short-2:~> cat $PBS_NODEFILE | wc -l
64
gbeane at cs-short-2:~> cat $PBS_NODEFILE | uniq | wc -l
17


More information about the torqueusers mailing list