[torqueusers] Unable to run sequential job

Simard, Jonathan jsimard at teraxion.com
Tue Feb 12 11:00:07 MST 2013


Dear,
I'm unable to run sequential job instead I set the max_running setting to one.

I can run multiple job at the same time and I would like to wait for the first job finish before the other start :

lumerical at XXX:~/Simulation> qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
107.XXX                  STDIN            lumerical              0 R test
108.XXX                  STDIN            lumerical              0 R test
109.XXX                  STDIN            lumerical              0 R test
110.XXX                  STDIN            lumerical              0 R test
111.XXX                  STDIN            lumerical              0 R test
112.XXX                  STDIN            lumerical              0 R test


If I try to start job with resource contention my job stay in queue and did not start automatically after de resources are free:

lumerical at XXX:~/Simulation/Jonathan> qrun 105
qrun: Resource temporarily unavailable MSG=job allocation request exceeds currently available cluster nodes, 1 requested, 0 available 106.XXX.teraxion

lumerical at XXX:~/Simulation/Jonathan> qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
102.XXX                  STDIN            lumerical       00:00:00 C test
103.XXX                  STDIN            lumerical       00:00:00 C test
104.XXX                  STDIN            lumerical              0 R test
105.XXX                  STDIN            lumerical              0 Q test
106.XXX                  STDIN            lumerical              0 Q test
lumerical at XXX:~/Simulation/Jonathan> qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
102.XXX                  STDIN            lumerical       00:00:00 C test
103.XXX                  STDIN            lumerical       00:00:00 C test
104.XXX                  STDIN            lumerical       00:00:00 C test
105.XXX                  STDIN            lumerical              0 Q test
106.XXX                  STDIN            lumerical              0 Q test

I use Torque 4.2.0

XXX:~ # qmgr -c 'p s'
#
# Create queues and set their attributes.
#
#
# Create and define queue test
#
create queue test
set queue test queue_type = Execution
set queue test max_queuable = 10
set queue test max_running = 2
set queue test resources_default.nodes = 1
set queue test resources_default.walltime = 01:00:00
set queue test enabled = True
set queue test started = True
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch max_running = 1
set queue batch resources_max.ncpus = 24
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = XXX
set server managers = lumerical at XXX.Teraxion<mailto:lumerical at XXX.Teraxion>
set server operators = lumerical at XXX.Teraxion<mailto:lumerical at XXX.Teraxion>
set server default_queue = test
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 300
set server job_stat_rate = 45
set server poll_jobs = True
set server keep_completed = 300
set server next_job_number = 113

Thanks for your help.

Jonathan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130212/737a1a5c/attachment-0001.html 


More information about the torqueusers mailing list