[torqueusers] getting started

David B Jackson jacksond at clusterresources.com
Fri Jan 20 09:26:08 MST 2006


Alexander,

  Thanks for the patches!  They have been rolled in and a nupdated
snapshot has been released.  Regarding totmem and availmem, you are
correct.  The freebsd interface is not properly loading and reporting
physical and virtual memory.  This can probably be fixed by importing
code from one of the other related linux variant mom_mach.c modules into
the totmem() and availmem() routines in resmom/freebsd/mom_mach.c.  See
the resmom/linux/mom_mach.c routines as an example.

  Regarding running large numbers of jobs, I think you are in luck.  We
have been discussing enabling full TORQUE job array functionality for
the last couple of months and I believe it is the next major project we
will be tackling.  Garrick can probably do a better job of providing
details.  You may want to subscribe to torquedev to get all the details.

Dave



> Hi!
>
>
>
> I am trying to play with Torque 2.0.0p5 on FreeBSD. First I had problems
> compiling the thing. Looks like a couple of lines are missing in
> src/resmom/freebsd/mom_mach.c. Here is my patch:
>
>
>
> 141a142,143
>
>> extern int LOGLEVEL;
>
>>
>
> 1754a1757,1758
>
>>   char        *id = "setmax";
>
>>
>
>
>
> After that it appears to build and run just fine. Except gui x* binaries
> are
> missing despite being enabled by default. Looks like some build script can
> not figure out tclsh for src/tools/xpbsmon/buildindex. I did not care too
> much for now.
>
>
>
> One more suspicious thing is in pbsnodes report:
>
>
>
> status = opsys=freebsd,uname=FreeBSD 4.10:i386,sessions= 60195 132 59821
> 59851 164 68625 5032,nsessions=7,nusers=4,idletime=258322,totmem=?
> 15201,availmem=?
> 15201,physmem=2058388kb,ncpus=4,loadave=1.98,gres=pbsserver:pilgrim.corp,net
> load=? 15201,state=busy,jobs=? 15201,rectime=1136576517
>
>
>
> What are those 'totmem=? 15201'? Is it an indication of a problem?
>
>
>
> Now the real question: I would like to run batches of 500 rather small
> jobs
> (each takes from few minutes to, say, ~1 hour). Each job is an instance of
> the same script with a number from 1 to 500 as a parameter. I have 5
> dual-CPU x86 machines, so I configured 4 slots (ncpus) on each. I would
> like
> to submit 500 jobs so they occupy all 20 slots, and the next one starts as
> soon as one slot becomes free. What is the best way to do that? Currently
> I
> run qsub 500 times, which is slow. It would be great if I can treat those
> jobs as a group: hold, delete, change priority. Is it possible?
>
>
>
> Thanks a lot.
>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



More information about the torqueusers mailing list