[torqueusers] momctl error - A description

Garrick Staples garrick at clusterresources.com
Tue Mar 6 14:54:48 MST 2007


On Tue, Mar 06, 2007 at 04:53:16PM -0500, michael young alleged:
> Sorry about that.
> 
> Backgroud:
> We have a cluster of Sun servers.
> 1 master and 12 slave nodes.
> AMD Opteron Processor 248 2.2 GHz, 4GB ram, 74 GB SCSI HD
> It runs Spartan '04 on Red Hat Enterprise Linux AS release 4 (Nahant 
> Update 1).
> master node's name: cluster
> slave node's names: he1 - he12
> 
> Problem:
> When a job is submitted to the cluster, it runs only on the master node.
> It does not pass any work to the slave nodes.

While the job is running, does 'qstat -n <jobid>' show that the job is
assigned to a node?

Since your master node isn't running pbs_mom, this implies that the
problem is in your job script.  Is your job script using $PBS_NODEFILE
to spawn the processes?



More information about the torqueusers mailing list