[torqueusers] Fw: Follow up- Ben-Gurion University

tami tami at ee.bgu.ac.il
Tue Jul 8 00:11:27 MDT 2008


Hi,

>
> For some reason our torque installation does not function properly.
> Mpi jobs submitted via qsub end up being executed only on the server.
>
> I've just reinstalled the package on the main server and followed 
> directions
> in  http://www.clusterresources.com/torquedocs21/1.2basicconfig.shtml
>
> The qmgr.conf built by torque.setup is:
>
>
>
> #
> # Create queues and set their attributes.
> #
> #
> # Create and define queue batch
> #
> create queue batch
> set queue batch queue_type = Execution
> set queue batch resources_default.nodes = 1
> set queue batch resources_default.walltime = 01:00:00
> set queue batch resources_available.nodect = 999999
> set queue batch enabled = True
> set queue batch started = True
> #
> # Set server attributes.
> #
> set server scheduling = True
> set server managers = root at vdwarf1.ee.bgu.ac.il
> set server operators = root at vdwarf1.ee.bgu.ac.il
> set server default_queue = batch
> set server log_events = 511
> set server mail_from = adm
> set server resources_available.nodect = 999999
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server mom_job_sync = True
> set server pbs_version = 2.2.1
> set server keep_completed = 300
>
>
> the mom_priv/config on all nodes:
>
>
> $pbsserver      vdwarf1.ee.bgu.ac.il          # note: hostname running 
> pbs_server
> $logevent       255               # bitmap of which events to log
> $usecp  *:/users/agnon /users/agnon
>
>
> pbsnodes -a gives:
>
> vdwarf1.ee.bgu.ac.il
>     state = free
>     np = 1
>     ntype = cluster
>     status = opsys=linux,uname=Linux vdwarf1.ee.bgu.ac.il 
> 2.6.18-53.1.13.el5xen #1 SMP Tue Feb 12 13:33:07 EST 2008 
> x86_64,sessions=1976 5683 19809 19902 19935,nsessions=5,nuser
> s=4,idletime=18982,totmem=3120472kb,availmem=2820464kb,physmem=1024000kb,ncpus=1,loadave=1.05,netload=34003964399,state=free,jobs=,varattr=,rectime=1215435091
>
> vdwarf2.ee.bgu.ac.il
>     state = free
>     np = 1
>     ntype = cluster
>     status = opsys=linux,uname=Linux vdwarf2.ee.bgu.ac.il 
> 2.6.18-53.1.13.el5xen #1 SMP Tue Feb 12 13:33:07 EST 2008 
> x86_64,sessions=1947 12905 31246 31279,nsessions=4,nusers=3,i
> dletime=163257,totmem=3120472kb,availmem=2816592kb,physmem=1024000kb,ncpus=1,loadave=1.05,netload=33551595209,state=free,jobs=,varattr=,rectime=1215435099
>
> vdwarf3.ee.bgu.ac.il
>     state = free
>     np = 1
>     ntype = cluster
>     status = opsys=linux,uname=Linux vdwarf3.ee.bgu.ac.il 
> 2.6.18-53.1.13.el5xen #1 SMP Tue Feb 12 13:33:07 EST 2008 
> x86_64,sessions=1946 3927 19798 19831,nsessions=4,nusers=3,id
> letime=9244,totmem=3120472kb,availmem=2823376kb,physmem=1024000kb,ncpus=1,loadave=1.07,netload=29628498243,state=free,jobs=,varattr=,rectime=1215435099
>
> vdwarf4.ee.bgu.ac.il
>     state = free
>     np = 1
>     ntype = cluster
>     status = opsys=linux,uname=Linux vdwarf4.ee.bgu.ac.il 
> 2.6.18-53.1.13.el5xen #1 SMP Tue Feb 12 13:33:07 EST 2008 
> x86_64,sessions=1946,nsessions=1,nusers=1,idletime=39985,totm
> em=3120472kb,availmem=2857876kb,physmem=1024000kb,ncpus=1,loadave=0.03,netload=27446065910,state=free,jobs=,varattr=,rectime=1215435099
>
> vdwarf5.ee.bgu.ac.il
>     state = free
>     np = 1
>     ntype = cluster
>     status = opsys=linux,uname=Linux vdwarf5.ee.bgu.ac.il 
> 2.6.18-53.1.13.el5xen #1 SMP Tue Feb 12 13:33:07 EST 2008 
> x86_64,sessions=1947,nsessions=1,nusers=1,idletime=74368,totm
> em=3120472kb,availmem=2876668kb,physmem=1024000kb,ncpus=1,loadave=0.03,netload=23369215227,state=free,jobs=,varattr=,rectime=1215435097
>
> vdwarf6.ee.bgu.ac.il
>     state = free
>     np = 1
>     ntype = cluster
>     status = opsys=linux,uname=Linux vdwarf6.ee.bgu.ac.il 
> 2.6.18-53.1.13.el5xen #1 SMP Tue Feb 12 13:33:07 EST 2008 
> x86_64,sessions=1948 3903 4102,nsessions=3,nusers=2,idletime=
> 272851,totmem=3120472kb,availmem=2871080kb,physmem=1024000kb,ncpus=1,loadave=0.07,netload=23006626131,state=free,jobs=,varattr=,rectime=1215435099
>
> vdwarf7.ee.bgu.ac.il
>     state = free
>     np = 1
>     ntype = cluster
>     status = opsys=linux,uname=Linux vdwarf7.ee.bgu.ac.il 
> 2.6.18-53.1.13.el5xen #1 SMP Tue Feb 12 13:33:07 EST 2008 
> x86_64,sessions=1958 3431 6835,nsessions=3,nusers=3,idletime=
> 66228,totmem=3120472kb,availmem=2856900kb,physmem=1024000kb,ncpus=1,loadave=0.00,netload=22806857675,state=free,jobs=,varattr=,rectime=1215435099
>
> vdwarf8.ee.bgu.ac.il
>     state = free
>     np = 1
>     ntype = cluster
>     status = opsys=linux,uname=Linux vdwarf8.ee.bgu.ac.il 
> 2.6.18-53.1.13.el5xen #1 SMP Tue Feb 12 13:33:07 EST 2008 
> x86_64,sessions=1942 30773,nsessions=2,nusers=2,idletime=236,
>
> and so on until vdwarf30.
>
>
> Any idea what is wrong with the configuration and why it is not 
> functioning properly?
>
> Thanks,
>
> Tami Chuchem
> Head of Computer Unit
> Electrical & Computer Engineering department
> Ben-Gurion University
> POB 653 Beer-Sheva 84105
> Israel
> Phone: 972-8-6461527
> Fax: 972-8-6472949
>
>
>
>
>
>
>
>
>



More information about the torqueusers mailing list