[torqueusers] Node is not responding!

Hakeem Almabrazi halmabrazi at idtdna.com
Fri Oct 14 15:36:06 MDT 2011


Another easy question, where the output of my script should be stored at?  For example looking the script below, I could not find the test.o or test.err files in the node where I submitted the job from.


-----Original Message-----
From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Steve Crusan
Sent: Thursday, October 13, 2011 5:47 PM
To: Torque Users Mailing List
Subject: Re: [torqueusers] Node is not responding!

Hash: SHA1

On Oct 13, 2011, at 3:34 PM, Hakeem Almabrazi wrote:

> Dear All,
> First, I am newbie to Torque and this is my first message to this group.  I hope I will not waste anyone's time by asking such stupid question but I have tried to look for some answers in the archived listinfo but since there is no search capabilities built in I find it harder to find what I need.
> Here is where I am so far:
> I installed the Torque 3.0 package on my  Linux box (SUSE 11.2).  I also configured a node on a different VM that is running SUSE as well.  It seems things are installed and configured correctly (I think).
> When I run the pbsnodes I get
> suse-ptpd-16
>     state = free
>     np = 1
>     ntype = cluster
>     status = rectime=1318533469,varattr=,jobs=,state=free,netload=116214003,gres=,loadave=0.00,ncpus=1,physmem=1017908kb,availmem=3012532kb,totmem=3115056kb,idletime=76,nusers=2,nsessions=7,sessions=1753 1767 1770 1889 1894 1997 3017,uname=Linux suse-ptpd-16 2.6.34-12-desktop #1 SMP PREEMPT 2010-06-29 02:39:08 +0200 i686,opsys=linux
>     mom_service_port = 15002
>     mom_manager_port = 15003
>     gpus = 0
> When I shut down the node it changes to "down" in the state.  This tells me everything is okay.
> However, when I tried to send my first job to the node.  I used this example found online
>> test.job
> #!/bin/bash
> # --- send the output to the test.out file
> #     the default is .o<jobid>
> #PBS -o test.out
> # --- send the error output to the test.err file
> #     the default is .e<jobid>
> #PBS -e test.err
> echo "Print out the hostname and date"
> /bin/hostname
> /bin/date
> exit 0
> And then I ran it from the head node (not as a root)
>> qsub test.job
> Looking at the submitted jobs  ( I submitted the jobs twice)
>> qstat
> Job id                    Name             User            Time Use S Queue
> ------------------------- ---------------- --------------- -------- - -----
> 16.suse-halmabr            test.job         torqueuser             0 Q batch
> 17.suse-halmabr            test.job         torqueuser             0 Q batch
> However,  nothing seems to be happening after that.
> Can any body tell me what I am doing wrong or if I am missing something here?  Also, it will be great if someone can direct me to the right site for examples on how to use the server that will be highly appreciated.

What scheduler are you using above TORQUE ( PBS_SCHED, MAUI, etc)? If there is nothing that tells TORQUE to run the job, it won't run (unless you force it to run with qrun <jobid>). 

If everything is setup right with a scheduler and such, you should be able to submit a job interactively to run some basic tests, a la:    qsub -I -q batch ,etc,etc...

If you are using Maui/Moab as a scheduler, run a checkjob -v <nodeid> which will give you some information on why the scheduler isn't starting the job.

Here is a link to integrating TORQUE+MAUI [open source]:

You should be able to use the base Maui setup with TORQUE.

Hope that helps.


> Regards,
> ~Hak
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

 Steve Crusan
 System Administrator
 Center for Research Computing
 University of Rochester

Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

torqueusers mailing list
torqueusers at supercluster.org

More information about the torqueusers mailing list