[torqueusers] Node is not responding!
scrusan at ur.rochester.edu
Thu Oct 13 16:47:28 MDT 2011
-----BEGIN PGP SIGNED MESSAGE-----
On Oct 13, 2011, at 3:34 PM, Hakeem Almabrazi wrote:
> Dear All,
> First, I am newbie to Torque and this is my first message to this group. I hope I will not waste anyone's time by asking such stupid question but I have tried to look for some answers in the archived listinfo but since there is no search capabilities built in I find it harder to find what I need.
> Here is where I am so far:
> I installed the Torque 3.0 package on my Linux box (SUSE 11.2). I also configured a node on a different VM that is running SUSE as well. It seems things are installed and configured correctly (I think).
> When I run the pbsnodes I get
> state = free
> np = 1
> ntype = cluster
> status = rectime=1318533469,varattr=,jobs=,state=free,netload=116214003,gres=,loadave=0.00,ncpus=1,physmem=1017908kb,availmem=3012532kb,totmem=3115056kb,idletime=76,nusers=2,nsessions=7,sessions=1753 1767 1770 1889 1894 1997 3017,uname=Linux suse-ptpd-16 2.6.34-12-desktop #1 SMP PREEMPT 2010-06-29 02:39:08 +0200 i686,opsys=linux
> mom_service_port = 15002
> mom_manager_port = 15003
> gpus = 0
> When I shut down the node it changes to "down" in the state. This tells me everything is okay.
> However, when I tried to send my first job to the node. I used this example found online
> # --- send the output to the test.out file
> # the default is .o<jobid>
> #PBS -o test.out
> # --- send the error output to the test.err file
> # the default is .e<jobid>
> #PBS -e test.err
> echo "Print out the hostname and date"
> exit 0
> And then I ran it from the head node (not as a root)
>> qsub test.job
> Looking at the submitted jobs ( I submitted the jobs twice)
> Job id Name User Time Use S Queue
> ------------------------- ---------------- --------------- -------- - -----
> 16.suse-halmabr test.job torqueuser 0 Q batch
> 17.suse-halmabr test.job torqueuser 0 Q batch
> However, nothing seems to be happening after that.
> Can any body tell me what I am doing wrong or if I am missing something here? Also, it will be great if someone can direct me to the right site for examples on how to use the server that will be highly appreciated.
What scheduler are you using above TORQUE ( PBS_SCHED, MAUI, etc)? If there is nothing that tells TORQUE to run the job, it won't run (unless you force it to run with qrun <jobid>).
If everything is setup right with a scheduler and such, you should be able to submit a job interactively to run some basic tests, a la: qsub -I -q batch ,etc,etc...
If you are using Maui/Moab as a scheduler, run a checkjob -v <nodeid> which will give you some information on why the scheduler isn't starting the job.
Here is a link to integrating TORQUE+MAUI [open source]:
You should be able to use the base Maui setup with TORQUE.
Hope that helps.
> torqueusers mailing list
> torqueusers at supercluster.org
Center for Research Computing
University of Rochester
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
-----END PGP SIGNATURE-----
More information about the torqueusers