[torqueusers] Node is not responding!

Steve Crusan scrusan at ur.rochester.edu
Thu Oct 13 16:47:28 MDT 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Oct 13, 2011, at 3:34 PM, Hakeem Almabrazi wrote:

> Dear All,
> 
> First, I am newbie to Torque and this is my first message to this group.  I hope I will not waste anyone's time by asking such stupid question but I have tried to look for some answers in the archived listinfo but since there is no search capabilities built in I find it harder to find what I need.
> 
> Here is where I am so far:
> 
> I installed the Torque 3.0 package on my  Linux box (SUSE 11.2).  I also configured a node on a different VM that is running SUSE as well.  It seems things are installed and configured correctly (I think).
> 
> When I run the pbsnodes I get
> 
> suse-ptpd-16
>     state = free
>     np = 1
>     ntype = cluster
>     status = rectime=1318533469,varattr=,jobs=,state=free,netload=116214003,gres=,loadave=0.00,ncpus=1,physmem=1017908kb,availmem=3012532kb,totmem=3115056kb,idletime=76,nusers=2,nsessions=7,sessions=1753 1767 1770 1889 1894 1997 3017,uname=Linux suse-ptpd-16 2.6.34-12-desktop #1 SMP PREEMPT 2010-06-29 02:39:08 +0200 i686,opsys=linux
>     mom_service_port = 15002
>     mom_manager_port = 15003
>     gpus = 0
> 
> When I shut down the node it changes to "down" in the state.  This tells me everything is okay.
> 
> However, when I tried to send my first job to the node.  I used this example found online
> 
>> test.job
> 
> #!/bin/bash
> # --- send the output to the test.out file
> #     the default is .o<jobid>
> #PBS -o test.out
> # --- send the error output to the test.err file
> #     the default is .e<jobid>
> #PBS -e test.err
> 
> echo "Print out the hostname and date"
> /bin/hostname
> /bin/date
> exit 0
> 
> And then I ran it from the head node (not as a root)
> 
>> qsub test.job
> 
> Looking at the submitted jobs  ( I submitted the jobs twice)
> 
>> qstat
> Job id                    Name             User            Time Use S Queue
> ------------------------- ---------------- --------------- -------- - -----
> 16.suse-halmabr            test.job         torqueuser             0 Q batch
> 17.suse-halmabr            test.job         torqueuser             0 Q batch
> 
> 
> However,  nothing seems to be happening after that.
> 
> Can any body tell me what I am doing wrong or if I am missing something here?  Also, it will be great if someone can direct me to the right site for examples on how to use the server that will be highly appreciated.


What scheduler are you using above TORQUE ( PBS_SCHED, MAUI, etc)? If there is nothing that tells TORQUE to run the job, it won't run (unless you force it to run with qrun <jobid>). 

If everything is setup right with a scheduler and such, you should be able to submit a job interactively to run some basic tests, a la:    qsub -I -q batch ,etc,etc...

If you are using Maui/Moab as a scheduler, run a checkjob -v <nodeid> which will give you some information on why the scheduler isn't starting the job.

Here is a link to integrating TORQUE+MAUI [open source]:
http://www.adaptivecomputing.com/resources/docs/maui/pbsintegration.php

You should be able to use the base Maui setup with TORQUE.

Hope that helps.



~Steve

> 
> Regards,
> 
> ~Hak
> 
> 
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

 ----------------------
 Steve Crusan
 System Administrator
 Center for Research Computing
 University of Rochester
 https://www.crc.rochester.edu/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJOl2qHAAoJENS19LGOpgqK/CUH/0xmHHHhXYq0AHpM7FqP7d5L
xQRtNpbxlVejU68XFfxM7lHA/pp6lzb/niFBZG4ujHMofv84qKnAq7vFBskgIhKm
9AtTU+W3PkkjdHWS7lhYkSt0Wun+k0te5TMu/QfBfKpLTioEU0SqlHG+RhGgaWcn
ow5/+TXN/tQCTayaZko7m/VbOcFv258B1lqEQFwczf7KgUcUDKsYc27lZlxG3IGj
20CGHGytSueGwIv1YdD9QRo7zFXMy49keN7z8nDaasDuBCMtsBpJrX2Xt77zvfcY
ncyWknmd9+hgiAenLptFjm3T6Mt7Q+BDEWCT58buUQtgODHMiRMIHEwhTW1WE3w=
=uEFl
-----END PGP SIGNATURE-----


More information about the torqueusers mailing list