[torqueusers] More than one job per CPU

Garrick Staples garrick at usc.edu
Wed Sep 12 12:53:50 MDT 2007


On Tue, Sep 11, 2007 at 04:16:54PM -0500, Jeremy Mann alleged:
> I've been searching the mail archive most of the day and I haven't found
> anything regarding what our problem, well we call it a problem, is.
> 
> We have a program that we run on our cluster a few hundred iterations at a
> time. We nice the program 19 so it won't interfere with any other program.
> So far, we've been doing this manually. Now we want to incorporate it into
> PBS/Maui. The problem we are coming into is even though we submit it with
> -l nice=19, PBS still says that compute node is state=busy and all other
> jobs stay in the queue. We run the program niced 19 because it usually
> runs for about 5-6 days on our 20 nodes, so we need the ability to run
> other things during this time.
> 
> What I've been trying to accomplish for a few days now is to somehow make
> PBS submit a job to a compute node that has this niced 19 job running on
> it. I've tried everything I can think of and what I've found in the
> manpages.
> 
> The changes I've tried are:
> 
> In maui.cfg I've added:
> NODEACCESSPOLICY        SHARED
> NODEALLOCATIONPOLICY    MINRESOURCE
> NODECFG[DEFAULT]        PRIORITYF=JOBCOUNT
> NODEMAXLOAD             4.00
> 
> USERCFG[tigre]          QDEF=tigre
> USERCFG[abarca]         QDEF=gasbor
> QOSCFG[gasbor]          PRIORITY=-100 FLAGS=PREEMPTEE
> QOSCFG[tigre]           PRIORITY=100 FLAGS=PREEMPTOR:IGNMAXJOB
> 
> My idea here was to create to QoS's, where the gasbor job (the niced 19
> job) would preempt in favor of the tigre jobs. This however has never
> worked.
> 
> I took one compute node offline and edited it mom_priv/config file and
> added '$ideal_load 4.0'. My thinking here was if the telling PBS this node
> will run at a 4.0 load, it will execute mode jobs on this node. Again,
> this never worked either.

If the node is "busy" in torque, then maui won't run a job on it.  End of story.

So you want to keep the node from being busy with the $ideal_load and $max_load
options.  You mentioned that you tried the former, but did you also set the
later?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070912/8a03baab/attachment.bin


More information about the torqueusers mailing list