[torqueusers] PBS CODE ERROR

Gus Correa gus at ldeo.columbia.edu
Thu Dec 3 10:45:12 MST 2009


Hi Mino

On your PBS script, have you tried to replace:

#PBS -l nodes=1:ppn=8

by

#PBS -l nodes=1:private:ppn=8

?

Inserting the node properties/attributes there may
produce the effect you want (i.e., run on an10, I suppose).

I hope it helps,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


Mino Elefante wrote:
> Hi,
> i'm a new user of torque.
> 
> I'm installing torque in a cluster.
> I have a problem.
> 
> I have 2 queue in my cluster. When i submit a job in a specific queue, the job running in a node that not belong at a queue. Why?
> This is my configuration:
> 
> 
> nodes:
> ******************************
> an08 np=8 parallel
> an09 np=8 parallel
> an10 np=8 private
> ******************************
> 
> 
> server and queue
> ******************************
> create queue parallel
> set queue parallel queue_type = Execution
> set queue parallel max_running = 2
> set queue parallel resources_default.neednodes = parallel
> set queue parallel enabled = True
> set queue parallel started = True
> 
> create queue private
> set queue private queue_type = Execution
> set queue private resources_default.neednodes = private
> set queue private enabled = True
> set queue private started = True
> #
> # Set server attributes.
> #
> set server scheduling = True
> set server max_user_run = 10
> set server log_events = 511
> set server mail_from = adm
> set server query_other_jobs = True
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server pbs_version = 2.0.0p8
> *****************************
> 
> the script is:
> 
> ****************************
> #!/bin/sh
> ### Nome del job
> #PBS -N test8
> 
> #PBS -u mino
> ### Declare job non-rerunable
> #PBS -r n
> ### Output files
> #PBS -e output.err
> #PBS -o output.log
> ### Inserire il proprio indirizzo email
> #PBS -M mino at localhost
> #PBS -m ae
> ### Coda su cui lanciare il job
> #PBS -q private
> ### Numero di nodi (min=1 max=4) - ppn= Numero di processori per nodo (min=1 max=2)
> #PBS -l nodes=1:ppn=8
> 
> # Directori di lavoro 
> cd $PBS_O_WORKDIR
> 
> echo Running on host `hostname`
> echo Time is `date`
> echo Directory is `pwd`
> echo This jobs runs on the following processors:
> echo `cat $PBS_NODEFILE`
> # Define number of processors
> NPROCS=`wc -l < $PBS_NODEFILE`
> echo This job has allocated $NPROCS nodes
> 
> # Run the parallel MPI executable "a.out"
> /opt/hpmpi/bin/mpirun -TCP -v -hostfile $PBS_NODEFILE -np $NPROCS exec
> 
> ***************************
> 
> this job run in a node an08.
> why???
> 
> Thanks
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list