[torqueusers] Need help with NCPUS not working in QSUB

Coyle, James J [ITACD] jjc at iastate.edu
Thu Oct 6 14:33:01 MDT 2011


Torque and PBS give you a file named 
PBS_NODEFILE

For example with MPIPCH you could use

mpirun -np 28 -machinefile ${PBS_NODEFILE} ./prog

Then 28 copies of ./prog will be started on 
the 28 machines listed in  ${PBS_NODEFILE}

Other programs like Fluent need you to specify something like:
fluent 3ddp -t28 -pib -g -i Case.jou -cnf=${PBS_NODEFILE}


again here you need to specify a file containing the
machines on which to run each process.  If you leave off the
-cnf above, fluent will start all the processes on
the first node that the jobs got assigned to.
 

-----Original Message-----
>From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
>bounces at supercluster.org] On Behalf Of Lenox, Billy AMRDEC/Sentient
>Corp.
>Sent: Thursday, October 06, 2011 12:10 PM
>To: Torque Users Mailing List
>Subject: Re: [torqueusers] Need help with NCPUS not working in QSUB
>
>Ok I tried PBS -l procs=28 and it still runs on one NODE seed001
>I notice that if I put in the script on the EXEC line the location
>of a
>HOSTFILE it runs and bypasses TORQUE PBS. I just have the Default
>Scheduler
>on the System. I know I can not specify PBS -l nodes=5.
>I have tried different ways and still it will only run on ONE NODE
>seed001.
>
>Billy
>
>> From: Troy Baer <tbaer at utk.edu>
>> Organization: National Institute for Computational Sciences,
>University of
>> Tennessee
>> Reply-To: Torque Users Mailing List <torqueusers at supercluster.org>
>> Date: Thu, 6 Oct 2011 12:07:45 -0400
>> To: Torque Users Mailing List <torqueusers at supercluster.org>
>> Subject: Re: [torqueusers] Need help with NCPUS not working in
>QSUB
>>
>> On Thu, 2011-10-06 at 09:55 -0500, Lenox, Billy AMRDEC/Sentient
>Corp.
>> wrote:
>>> I have torque setup on a head node system with 5 compute nodes
>>> Two have 8 cores and 3 have 4 cores setup into on queue called
>batch
>>> When I use a submit script
>>>
>>> #!/bin/bash
>>> #PBS -l ncpus=28
>>> #PBS -l walltime=72:00:00
>>> #PBS -o output.out
>>> #PBS -e ie.error
>>>
>>> Here /var/spool/torque/server_priv/nodes
>>>
>>> seed001 np=8 batch
>>> seed002 np=8 batch
>>> seed003 np=8 batch
>>> seed004 np=8 batch
>>> seed005 np=8 batch
>>>
>>> When I submit the script it only runs on one node SEED001
>>>
>>> I don't know why it only runs on one node.
>>
>> Which scheduler are you using?  In most of the TORQUE-compatible
>> schedulers I've seen, the ncpus= resource is interpreted as how
>many
>> processors you want on a single shared memory system.  (If you
>want X
>> processors and you don't care where they are, I think the
>preferred way
>> of requesting it is procs=X.)
>>
>> --Troy
>> --
>> Troy Baer, HPC System Administrator
>> National Institute for Computational Sciences, University of
>Tennessee
>> http://www.nics.tennessee.edu/
>> Phone:  865-241-4233
>>
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>_______________________________________________
>torqueusers mailing list
>torqueusers at supercluster.org
>http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list