[torquedev] [Bug 67] Support for counted resources on nodes

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Wed Aug 4 07:44:51 MDT 2010


http://www.clusterresources.com/bugzilla/show_bug.cgi?id=67

--- Comment #21 from Simon Toth <SimonT at mail.muni.cz> 2010-08-04 07:44:51 MDT ---
(In reply to comment #20)
> Sorry for the delay in commenting Simon, been flat out bringing up new systems!
> 
> Can you comment on how this interacts with the various schedulers please ?
> 
> When a job is submitted and Maui/Moab/pbs_sched is working out where to put it
> will it take these limits into account, or will the pbs_server just refuse to
> start it if a limit is exceeded ?

The whole point of patch is to make the server a request verification
authority.

There are two checkpoints.

(1)
The submit is now checked not just against the list of resources on the job,
but also against the resources in the nodespec.

qsub -l mem=4G -l nodes=10:ncpus=5

translates into 4G memory and 50 ncpus and is checked against server limits.

The ncpus part wouldn't normally be checked. So this is the first place where
the job can be rejected, but normally wouldn't be.

This shouldn't be a problem I guess.


(2)
Upon run, the server receives a nodespec from the scheduler. This is the mostly
incompatible part. If the request does not contain any nodespec, the original
one submitted is used, if there is a nodespec, the nodespec is parsed.
Therefore the functionality pretty much depends on what the scheduler keeps in
the nodespec when sending a run request.

What this can lead to is that if the scheduler is set to an incompatible mode
(thinking that some resources do not exist, or they are per-proc instead of
per-node) his run requests can be denied by the server.

-- 
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the torquedev mailing list