[torqueusers] PBS CODE ERROR

Joshua Bernstein jbernstein at penguincomputing.com
Thu Dec 3 14:07:59 MST 2009


Hi Mino,

	The issue here is that TORQUE doesn't lock a job to a specific set of nodes 
regardless of their membership in a queue, unless the submitting job 
specifically requests that attribute.

	In fact, you can have two sets of nodes without separating them into queues at 
all. As another poster mentioned doing something like:

$ qsub -l nodes=1:private:ppn=8

is probably what you're looking for.

-Joshua Bernstein
Senior Software Engineer
Penguin Computing

Mino Elefante wrote:
> Hi,
> i'm a new user of torque.
> 
> I'm installing torque in a cluster.
> I have a problem.
> 
> I have 2 queue in my cluster. When i submit a job in a specific queue, the job running in a node that not belong at a queue. Why?
> This is my configuration:
> 
> 
> nodes:
> ******************************
> an08 np=8 parallel
> an09 np=8 parallel
> an10 np=8 private
> ******************************
> 
> 
> server and queue
> ******************************
> create queue parallel
> set queue parallel queue_type = Execution
> set queue parallel max_running = 2
> set queue parallel resources_default.neednodes = parallel
> set queue parallel enabled = True
> set queue parallel started = True
> 
> create queue private
> set queue private queue_type = Execution
> set queue private resources_default.neednodes = private
> set queue private enabled = True
> set queue private started = True
> #
> # Set server attributes.
> #
> set server scheduling = True
> set server max_user_run = 10
> set server log_events = 511
> set server mail_from = adm
> set server query_other_jobs = True
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server pbs_version = 2.0.0p8
> *****************************
> 
> the script is:
> 
> ****************************
> #!/bin/sh
> ### Nome del job
> #PBS -N test8
> 
> #PBS -u mino
> ### Declare job non-rerunable
> #PBS -r n
> ### Output files
> #PBS -e output.err
> #PBS -o output.log
> ### Inserire il proprio indirizzo email
> #PBS -M mino at localhost
> #PBS -m ae
> ### Coda su cui lanciare il job
> #PBS -q private
> ### Numero di nodi (min=1 max=4) - ppn= Numero di processori per nodo (min=1 max=2)
> #PBS -l nodes=1:ppn=8
> 
> # Directori di lavoro 
> cd $PBS_O_WORKDIR
> 
> echo Running on host `hostname`
> echo Time is `date`
> echo Directory is `pwd`
> echo This jobs runs on the following processors:
> echo `cat $PBS_NODEFILE`
> # Define number of processors
> NPROCS=`wc -l < $PBS_NODEFILE`
> echo This job has allocated $NPROCS nodes
> 
> # Run the parallel MPI executable "a.out"
> /opt/hpmpi/bin/mpirun -TCP -v -hostfile $PBS_NODEFILE -np $NPROCS exec
> 
> ***************************
> 
> this job run in a node an08.
> why???
> 
> Thanks
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list