[torqueusers] Job deferred on specific queue
sm4082 at nyu.edu
Mon Aug 19 07:44:51 MDT 2013
Just to make sure you do have feature defined as cn01 for the node cn01,
I believe need nodes has to have a feature name but not compute node name
it self. For example I have feature serial for many of our compute nodes
like compute-0-0 etc.
Please check qmgr command to define features for compute nodes.
Sent from cell. Please excuse my brevity and any typos.
On Aug 19, 2013 1:10 AM, "Jurgens de Bruin" <debruinjj at gmail.com> wrote:
> To All
> I am new to Torque and Maui and would appreciated any help.
> This is my current setup:
> Where queue hi_mem has to run on a specific cluster node cn01
> $ qmgr -c 'p s'
> # Create queues and set their attributes.
> # Create and define queue batch
> create queue batch
> set queue batch queue_type = Execution
> set queue batch resources_default.nodes = 1
> set queue batch resources_default.walltime = 01:00:00
> set queue batch enabled = True
> set queue batch started = True
> # Create and define queue hi_mem
> create queue hi_mem
> set queue hi_mem queue_type = Execution
> set queue hi_mem resources_default.neednodes = cn01
> set queue hi_mem resources_default.nodes = 1
> set queue hi_mem resources_default.walltime = 720:00:00
> set queue hi_mem enabled = True
> set queue hi_mem started = True
> # Set server attributes.
> set server scheduling = True
> set server acl_hosts = manager
> set server managers = root@*
> set server managers += name@*
> set server operators += name@*
> set server operators += root@*
> set server default_queue = batch
> set server log_events = 511
> set server mail_from = adm
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 300
> set server job_stat_rate = 45
> set server poll_jobs = True
> set server mom_job_sync = True
> set server keep_completed = 300
> set server next_job_number = 36
> set server moab_array_compatible = True
> When I submit a job to queue hi_mem i get the following:
> $ checkjob 35
> checking job 35
> State: Idle EState: Deferred
> Creds: user:jurgens group:jurgens class:hi_mem qos:DEFAULT
> WallTime: 00:00:00 of 30:00:00:00
> SubmitTime: Thu Aug 15 08:45:45
> (Time Queued Total: 00:00:01 Eligible: 00:00:01)
> Total Tasks: 1
> Req TaskCount: 1 Partition: ALL
> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
> Opsys: [NONE] Arch: [NONE] Features: [cn01]
> IWD: [NONE] Executable: [NONE]
> Bypass: 0 StartCount: 0
> PartitionMask: [ALL]
> Flags: RESTARTABLE
> job is deferred. Reason: NoResources (cannot create reservation for job
> '35' (intital reservation attempt)
> Holds: Defer (hold reason: NoResources)
> PE: 1.00 StartPriority: 1
> cannot select job 35 for partition DEFAULT (job hold active)
> but when I run the same job via queue batch there is no problem and the
> job runs fine.
> the job is a very simple echo sleep just to test the queue.
> Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
> distinti saluti/siong/duì yú/привет
> Jurgens de Bruin
> torqueusers mailing list
> torqueusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers