[torqueusers] Policies for scheduling with unusual/reserved-use nodes.

Steve Crusan scrusan at ur.rochester.edu
Fri Oct 14 16:18:53 MDT 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Oct 14, 2011, at 5:44 PM, Coyle, James J [ITACD] wrote:

> All,
> 
>  I'm running Torque 2.5.4 on an homogenous cluster with just Opteron Cpus, and someone
> wants to add similar machines with two GPU cards in them.
> 
>  I am unsure whether this person will want these machines held exclusively for
> his group's use, and whether/how Torque can accommodate this.
> 
>  How have others handled the technical end of this?
> 
>  I know that users can easily specify
>       -l nodes=x:ppn=y:gpus=z
> to get on specifically those nodes, and I can set a property for specific nodes, e.g.
> nogpu so that a job can specify
>        -l nodes=x:ppn=y:nogpu
> so that the job would specifically avoid those nodes (and maybe put that
> in a job wrapper script.)
> 
>  However, what do you do if the new group wants "nobody else can run on my nodes"
> in an environment where users could specify gpus=z even when they did not need it,
> or just leave off the nogpu so that they get scheduled on any available nodes.


If I'm not mistaken, you want to just protect privately purchased resources from being used by members of the shared resources? Meaning, if you don't pay, you don't get to play on those nodes?

If that's the case, you can simply do this with TORQUE. If all of your non-privately funded nodes have one similar feature, you could set your 'standard' queue with a resources_default.neednodes = featurename. For the private nodes, create a queue that has a neednodes feature that ONLY the privately funded hardware has, and then set a group acl on that queue. So basically standard nodes require a common feature anyone can use, but private nodes require a feature that only certain groups can use based off of TORQUE acl_groups = groupname statements.




If you're interested in using Maui to achieve a setup like this, you can map classes/qos -> a standing reservation that contains those gpu nodes. Basically:

(this can also be done with CLASS=)

CLASSCFG[econ]          QLIST=econqos QDEF=econqos
SRCFG[privhwres]       OWNER=QOS:econqos
SRCFG[privhwres]        QOSLIST=econqos
SRCFG[privhwres]        HOSTLIST=node0[1-4]


You can just create a standing reservation with access restrictions, whether it be CLASS,QOS,USER,ACCOUNT, etc. What we do is map a class (class has TORQUE acl constraints) -> qos -> standing reservation.

We haven't had any problems.


Let me know if this helps.



~Steve


> 
> We've never had this, as all the machines were the same,  but I may need to implement it,
> likely be on short notice, so I want to be ahead of the curve.
> 
>  Can such a policy be implemented with
> preferably via
> 
> 1)    pbs_sched  or
> 
> 2)    MAUI ?
> 
>  I can probably hack something together myself, but I'd guess that others must
> have crossed this bridge, and I'd like learn from those with this experience.
> 
> Thanks,
> 
> -         Jim
> 
> James Coyle, PhD
> High Performance Computing Group
> Iowa State Univ.
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

 ----------------------
 Steve Crusan
 System Administrator
 Center for Research Computing
 University of Rochester
 https://www.crc.rochester.edu/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJOmLVTAAoJENS19LGOpgqKGR0IAI9EjIz4cMEqmR/ope2iKn44
2GVFvWfY4EWz8Gtyg39DgmwfxF2506tNBxskB3YNvTD4MMUt38083PzLFdV8/kVz
dHkTCa719Z58s5k/k3ruYE/HR2sSIvuBLBcmsIm1fS1x67K3cfAJ5TycmHdCLw5a
ztgcovtFZ+LXFvSBPjOyjEn5fnG76YLtq3lcU/7abw8VeQ8nM3C9nnpzDnuaPUVG
rnyzO7mtvrAgSiaV+WVDtVCOi9Iz7qM7UwLR00dbJyChE3W0QdYpDYR9E8y/Upnt
70/oFsHXApenlo+9I7d9Z7DZQ11clrMoDQpBg1vja52Go/A6/t4cjEdRn8TFEKE=
=0TKK
-----END PGP SIGNATURE-----


More information about the torqueusers mailing list