[torqueusers] Nodes have state of free when running jobs

David Backeberg david.backeberg at case.edu
Thu Nov 8 20:24:42 MST 2007


I don't see anything in your config or submission script dictating
which machines should get jobs from which queue. Without such
stipulations, Torque is free to put jobs across any given nodes, as
far as I know.

You could choose to apply extended properties to various nodes in your
nodes file, and then you can use those as a parameter when submitting
jobs. In this manner certain nodes with such properties would be the
preferred, or only targets for a specific job. I don't see evidence
you have gone this route.

If the only difference between these jobs is the length they will run,
then I'm not sure your approach to setting separate queues has any
advantage. The main point of the scheduler is that it will examine
estimated job lengths and backfill onto free nodes as best as it can.
Do you want to articulate what you're trying to accomplish?

On Nov 8, 2007 11:21 AM, Andrus, Mr. Brian (Contractor)
<brian.andrus at nrlmry.navy.mil> wrote:
>
>
>
>
> Ok,
> Setup: Torque 2.2.1, RHEL4U5, Torque scheduler as well (for now).
>
> I submit several jobs. 2 for the long queue, 3 for the medium queue.
> I do qstat and see one long running, one medium running and the rest queued.
> I also see that the long job and the medium job are running on the same set
> of nodes?
>
> My config:
> ---------------------------------
> create queue short
> set queue short queue_type = Execution
> set queue short Priority = 40
> set queue short max_running = 3
> set queue short resources_max.nodect = 4
> set queue short resources_max.walltime = 00:15:00
> set queue short enabled = True
> set queue short started = True
> #
> # Create and define queue medium
> #
> create queue medium
> set queue medium queue_type = Execution
> set queue medium Priority = 30
> set queue medium max_running = 4
> set queue medium resources_max.nodect = 8
> set queue medium resources_max.walltime = 04:00:00
> set queue medium enabled = True
> set queue medium started = True
> #
> # Create and define queue long
> #
> create queue long
> set queue long queue_type = Execution
> set queue long Priority = 20
> set queue long max_running = 1
> set queue long resources_max.nodect = 16
> set queue long resources_max.walltime = 24:00:00
> set queue long enabled = True
> set queue long started = True
> #
> # Set server attributes.
> #
> set server scheduling = True
> set server max_running = 30
> set server acl_roots = root
> set server default_queue = short
> set server log_events = 0
> set server query_other_jobs = True
> set server scheduler_iteration = 60
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server log_level = 7
> set server pbs_version = 2.2.1
> set server submit_hosts = login1
> ----------------------------------------------------
>
> My job script:
> ------------------
> #!/bin/bash
> #PBS -j oe
> #PBS -l nodes=16:ppn=2
> #PBS -W x=NACCESSPOLICY:SINGLEJOB
> #PBS -N LongTestJob
> #PBS -q long
> #PBS -o output-long.txt
> #PBS -V
>
> cd $PBS_O_WORKDIR
> rm -f output.txt
> date
> mpirun --mca btl openib,self /data/andrus/hello
> sleep 30
> -------------------
> Same for both jobs except jobname and output file name.
>
> Questions: Why are my jobs running on nodes that should be job-exclusive?
> Why isn't more than one medium job running at the same time?
>
>
> Brian Andrus perotsystems
>  Site Manager | Sr. Computer Scientist
>  Naval Research Lab
>  7 Grace Hopper Ave, Monterey, CA  93943
>  Phone (831) 656-4839 | Fax (831) 656-4866
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>


More information about the torqueusers mailing list