[torqueusers] how -l procs works
Ken Nielson
knielson at adaptivecomputing.com
Wed Jun 2 10:04:00 MDT 2010
On 06/02/2010 09:55 AM, Glen Beane wrote:
> On Wed, Jun 2, 2010 at 11:48 AM, Ken Nielson
> <knielson at adaptivecomputing.com> wrote:
>
>> On 06/02/2010 09:40 AM, Glen Beane wrote:
>>
>>> On Wed, Jun 2, 2010 at 11:33 AM, Glen Beane<glen.beane at gmail.com> wrote:
>>>
>>>
>>>> On Wed, Jun 2, 2010 at 11:04 AM, Ken Nielson
>>>> <knielson at adaptivecomputing.com> wrote:
>>>>
>>>>
>>>>> Hi all,
>>>>>
>>>>> On another thread with the subject "qsub on several nodes" it was
>>>>> suggested the procs is a better solution to scattering jobs across all
>>>>> available processors than nodes. However, I find the procs resource
>>>>> does not seem to behave the way described in the thread.
>>>>>
>>>>> For instance if I do the following:
>>>>>
>>>>> qsub -l procs=5<job.sh>
>>>>>
>>>>> The qstat output shows the following resource list
>>>>>
>>>>> Resource_List.neednodes = 1
>>>>> Resource_List.nodect = 1
>>>>> Resource_List.nodes = 1
>>>>> Resource_List.procs = 5
>>>>>
>>>>> If I do a qrun on this job it will be assigned a single node and one
>>>>> processor.
>>>>>
>>>>> The qstat -f after the job is started gives and exec_host of node/0.
>>>>> TORQUE ignores the procs keyword and assigns the default of 1 node and
>>>>> one processor to the job.
>>>>>
>>>>> Moab interprets procs to mean number of processors requested on a single
>>>>> node for the job. If I let Moab Schedule the job the exec_host from
>>>>> qstat is node/0+node/1+node/2+node/3+node/4.
>>>>>
>>>>> If I make the value of procs greater than the number of processors on
>>>>> any node moab will not run the job.
>>>>>
>>>>> Ken
>>>>>
>>>>>
>>>> as far as I know, moab looks at -l proc=X, interprets it, and then
>>>> sets the exec_host to some set of nodes that satisfies the request.
>>>> It is a hack and defiantly won't work with qrun, since it requires
>>>> that the exec_host list is set. The fact that it is basically ignored
>>>> by torque is my major complaint with how it is implemented.
>>>>
>>>> I use it all the time to request more processors than will run on a
>>>> single node. For example, I routinely use -l procs=32 or more on a
>>>> cluster of 4 core nodes. I'm using Moab 5.4.0, but I know I've used
>>>> it on some recent previous versions.
>>>>
>>>>
>>> gbeane at wulfgar:~> echo "pbsdsh hostname" | qsub -N procs_test -l
>>> procs=64,walltime=00:01:00
>>> 69641.wulfgar.jax.org
>>> gbeane at wulfgar:~> cat procs_test.o69641 | sort | uniq | wc -l
>>> 17
>>>
>>> it took 17 unique nodes to satisfy my procs=64 request. Some nodes I
>>> was allocated 4 cores, others I was allocated some subset of the total
>>> number of cores because others were in use.
>>>
>>> another example
>>>
>>> gbeane at wulfgar:~> qsub -l procs=64,walltime=00:05:00 -I
>>> qsub: waiting for job 69642.wulfgar.jax.org to start
>>> qsub: job 69642.wulfgar.jax.org ready
>>>
>>> Have a lot of fun...
>>> Directory: /home/gbeane
>>> Wed Jun 2 11:38:05 EDT 2010
>>> gbeane at cs-short-2:~> cat $PBS_NODEFILE | wc -l
>>> 64
>>> gbeane at cs-short-2:~> cat $PBS_NODEFILE | uniq | wc -l
>>> 17
>>> _______________________________________________
>>>
>>>
>> So, is the idea to let Moab create the node spec and then use qrun to
>> execute the job?
>>
> I don't think you'll get it to work with qrun...
>
> Moab tells pbs_server to run the job on a set of nodes. While
> pbs_server thinks the job only wants 1 node since it has no clue what
> procs=N means, it does what Moab tells it to do so the job actually
> runs on the number of processors the user requested with procs.
>
>
> I'm not sure why you are having difficulties with procs=N when N>
> number of cores on a node.
> _______________________________________________
>
Well, I just restarted my set up and sure enough Moab does run the jobs
across as many nodes a necessary.
Nevermind.
Ken
More information about the torqueusers
mailing list