[torqueusers] momctl error - A description

michael young mhyoung at valdosta.edu
Tue Mar 6 15:31:46 MST 2007


Garrick Staples wrote:

>On Tue, Mar 06, 2007 at 04:53:16PM -0500, michael young alleged:
>  
>
>>Sorry about that.
>>
>>Backgroud:
>>We have a cluster of Sun servers.
>>1 master and 12 slave nodes.
>>AMD Opteron Processor 248 2.2 GHz, 4GB ram, 74 GB SCSI HD
>>It runs Spartan '04 on Red Hat Enterprise Linux AS release 4 (Nahant 
>>Update 1).
>>master node's name: cluster
>>slave node's names: he1 - he12
>>
>>Problem:
>>When a job is submitted to the cluster, it runs only on the master node.
>>It does not pass any work to the slave nodes.
>>    
>>
>
>While the job is running, does 'qstat -n <jobid>' show that the job is
>assigned to a node?
>  
>
How do I determan the jobid?
Just running qstat give no output.

>Since your master node isn't running pbs_mom, this implies that the
>problem is in your job script.  Is your job script using $PBS_NODEFILE
>to spawn the processes?
>
>  
>
Where do I find the job script?
I did a 'env' and there is no "$PBS_NODEFILE"

>_______________________________________________
>torqueusers mailing list
>torqueusers at supercluster.org
>http://www.supercluster.org/mailman/listinfo/torqueusers
>
>  
>


More information about the torqueusers mailing list