[torqueusers] Node is not responding!

Hakeem Almabrazi halmabrazi at idtdna.com
Thu Oct 13 13:34:22 MDT 2011


Dear All,

First, I am newbie to Torque and this is my first message to this group.  I hope I will not waste anyone's time by asking such stupid question but I have tried to look for some answers in the archived listinfo but since there is no search capabilities built in I find it harder to find what I need.

Here is where I am so far:

I installed the Torque 3.0 package on my  Linux box (SUSE 11.2).  I also configured a node on a different VM that is running SUSE as well.  It seems things are installed and configured correctly (I think).

When I run the pbsnodes I get

suse-ptpd-16
     state = free
     np = 1
     ntype = cluster
     status = rectime=1318533469,varattr=,jobs=,state=free,netload=116214003,gres=,loadave=0.00,ncpus=1,physmem=1017908kb,availmem=3012532kb,totmem=3115056kb,idletime=76,nusers=2,nsessions=7,sessions=1753 1767 1770 1889 1894 1997 3017,uname=Linux suse-ptpd-16 2.6.34-12-desktop #1 SMP PREEMPT 2010-06-29 02:39:08 +0200 i686,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003
     gpus = 0

When I shut down the node it changes to "down" in the state.  This tells me everything is okay.

However, when I tried to send my first job to the node.  I used this example found online

>test.job

#!/bin/bash
# --- send the output to the test.out file
#     the default is .o<jobid>
#PBS -o test.out
# --- send the error output to the test.err file
#     the default is .e<jobid>
#PBS -e test.err

echo "Print out the hostname and date"
/bin/hostname
/bin/date
exit 0

And then I ran it from the head node (not as a root)

>qsub test.job

Looking at the submitted jobs  ( I submitted the jobs twice)

>qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
16.suse-halmabr            test.job         torqueuser             0 Q batch
17.suse-halmabr            test.job         torqueuser             0 Q batch


However,  nothing seems to be happening after that.

Can any body tell me what I am doing wrong or if I am missing something here?  Also, it will be great if someone can direct me to the right site for examples on how to use the server that will be highly appreciated.

Regards,

~Hak




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20111013/802ffcb0/attachment.html 


More information about the torqueusers mailing list