[torqueusers] Node is not responding!

Hakeem Almabrazi halmabrazi at idtdna.com
Thu Oct 13 13:34:22 MDT 2011

Dear All,

First, I am newbie to Torque and this is my first message to this group.  I hope I will not waste anyone's time by asking such stupid question but I have tried to look for some answers in the archived listinfo but since there is no search capabilities built in I find it harder to find what I need.

Here is where I am so far:

I installed the Torque 3.0 package on my  Linux box (SUSE 11.2).  I also configured a node on a different VM that is running SUSE as well.  It seems things are installed and configured correctly (I think).

When I run the pbsnodes I get

     state = free
     np = 1
     ntype = cluster
     status = rectime=1318533469,varattr=,jobs=,state=free,netload=116214003,gres=,loadave=0.00,ncpus=1,physmem=1017908kb,availmem=3012532kb,totmem=3115056kb,idletime=76,nusers=2,nsessions=7,sessions=1753 1767 1770 1889 1894 1997 3017,uname=Linux suse-ptpd-16 2.6.34-12-desktop #1 SMP PREEMPT 2010-06-29 02:39:08 +0200 i686,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003
     gpus = 0

When I shut down the node it changes to "down" in the state.  This tells me everything is okay.

However, when I tried to send my first job to the node.  I used this example found online


# --- send the output to the test.out file
#     the default is .o<jobid>
#PBS -o test.out
# --- send the error output to the test.err file
#     the default is .e<jobid>
#PBS -e test.err

echo "Print out the hostname and date"
exit 0

And then I ran it from the head node (not as a root)

>qsub test.job

Looking at the submitted jobs  ( I submitted the jobs twice)

Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
16.suse-halmabr            test.job         torqueuser             0 Q batch
17.suse-halmabr            test.job         torqueuser             0 Q batch

However,  nothing seems to be happening after that.

Can any body tell me what I am doing wrong or if I am missing something here?  Also, it will be great if someone can direct me to the right site for examples on how to use the server that will be highly appreciated.



