[torqueusers] Nodes have state of free when running jobs

Andrus, Mr. Brian (Contractor) brian.andrus at nrlmry.navy.mil
Thu Nov 8 09:21:29 MST 2007


Ok, 
Setup: Torque 2.2.1, RHEL4U5, Torque scheduler as well (for now).

I submit several jobs. 2 for the long queue, 3 for the medium queue. 
I do qstat and see one long running, one medium running and the rest
queued.
I also see that the long job and the medium job are running on the same
set of nodes?

My config:
---------------------------------
create queue short
set queue short queue_type = Execution
set queue short Priority = 40
set queue short max_running = 3
set queue short resources_max.nodect = 4
set queue short resources_max.walltime = 00:15:00
set queue short enabled = True
set queue short started = True
#
# Create and define queue medium
#
create queue medium
set queue medium queue_type = Execution
set queue medium Priority = 30
set queue medium max_running = 4
set queue medium resources_max.nodect = 8
set queue medium resources_max.walltime = 04:00:00
set queue medium enabled = True
set queue medium started = True
#
# Create and define queue long
#
create queue long
set queue long queue_type = Execution
set queue long Priority = 20
set queue long max_running = 1
set queue long resources_max.nodect = 16
set queue long resources_max.walltime = 24:00:00
set queue long enabled = True
set queue long started = True
#
# Set server attributes.
#
set server scheduling = True
set server max_running = 30
set server acl_roots = root
set server default_queue = short
set server log_events = 0
set server query_other_jobs = True
set server scheduler_iteration = 60
set server node_check_rate = 150
set server tcp_timeout = 6
set server log_level = 7
set server pbs_version = 2.2.1
set server submit_hosts = login1
----------------------------------------------------

My job script:
------------------
#!/bin/bash
#PBS -j oe
#PBS -l nodes=16:ppn=2
#PBS -W x=NACCESSPOLICY:SINGLEJOB
#PBS -N LongTestJob
#PBS -q long
#PBS -o output-long.txt
#PBS -V

cd $PBS_O_WORKDIR
rm -f output.txt
date
mpirun --mca btl openib,self /data/andrus/hello
sleep 30
-------------------
Same for both jobs except jobname and output file name.

Questions: Why are my jobs running on nodes that should be
job-exclusive? Why isn't more than one medium job running at the same
time?


Brian Andrus perotsystems 
Site Manager | Sr. Computer Scientist 
Naval Research Lab
7 Grace Hopper Ave, Monterey, CA  93943
Phone (831) 656-4839 | Fax (831) 656-4866 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20071108/378b59d5/attachment.html


More information about the torqueusers mailing list