[torqueusers] Two problems with a routing queue

Eiríkur Hjartarson Eirikur.Hjartarson at decode.is
Fri Sep 16 03:57:13 MDT 2011


Hi,

I'm resubmitting these questions since I got no replies to them one week ago.

In order to limit the number of jobs that maui considers for scheduling, we have a routing queue setup,

#
# Create and define queue exec
#
create queue exec
set queue exec queue_type = Route
set queue exec route_destinations = real_exec
set queue exec route_held_jobs = False
set queue exec enabled = True
set queue exec started = True
#
# Create and define queue real_exec
#
create queue real_exec
set queue real_exec queue_type = Execution
set queue real_exec max_user_queuable = 800
set queue real_exec from_route_only = True
set queue real_exec resources_default.nodes = 1
set queue real_exec enabled = True
set queue real_exec started = True

(800 is a bit higher than the number of CPUs in the cluster)

There are two problems that we have experienced with this setup.

1.

A job (id: 28379062), that is still on the "exec" queue and depends on another job (id: 28379059) that finishes *before* the job (id: 28379062) is put on the "real_exec" queue will generate the following error mail, when it (id: 28379062) is transferred to the "real_exec" queue.

---
PBS Job Id: 28379062.lpbs2.decode.is
Job Name:   bambino_22892
Aborted by PBS Server
Dependency request for job rejected by 28379059.lpbs2.decode.is Unknown Job Id Job held for unknown job dep, use 'qrls' to release
---

Is there any way to solve this problem, other than setting the keep_completed attribute to some non-zero value?  The problem with the keep_completed attribute is that we (think we) have to set it to a big value, say, one day.

2.

The "real_exec" queue may get filled up with jobs that all depend on a job that is still on the "exec" queue.  It seems possible to me that the route_held_jobs attribute only applies to user holds.  If that is correct, would it be possible to let it also apply to system holds?

Regards,
-- 
Eiríkur Hjartarson


More information about the torqueusers mailing list