[torqueusers] momctl error - A description

michael young mhyoung at valdosta.edu
Tue Mar 6 16:14:18 MST 2007


Garrick Staples wrote:

>On Tue, Mar 06, 2007 at 05:31:46PM -0500, michael young alleged:
>  
>
>>Garrick Staples wrote:
>>
>>    
>>
>>>On Tue, Mar 06, 2007 at 04:53:16PM -0500, michael young alleged:
>>>
>>>
>>>      
>>>
>>>>Sorry about that.
>>>>
>>>>Backgroud:
>>>>We have a cluster of Sun servers.
>>>>1 master and 12 slave nodes.
>>>>AMD Opteron Processor 248 2.2 GHz, 4GB ram, 74 GB SCSI HD
>>>>It runs Spartan '04 on Red Hat Enterprise Linux AS release 4 (Nahant 
>>>>Update 1).
>>>>master node's name: cluster
>>>>slave node's names: he1 - he12
>>>>
>>>>Problem:
>>>>When a job is submitted to the cluster, it runs only on the master node.
>>>>It does not pass any work to the slave nodes.
>>>>  
>>>>
>>>>        
>>>>
>>>While the job is running, does 'qstat -n <jobid>' show that the job is
>>>assigned to a node?
>>>
>>>
>>>      
>>>
>>How do I determan the jobid?
>>Just running qstat give no output.
>>    
>>
>
>Run qstat while a job is running.  If no jobs are in the queue, then qstat
>won't print anything.
>  
>

The thing that bothers me is that there is a job running currently.
Should it show up with 'qstat'?

>The jobid is first printed to the user when running qsub, and running
>'qstat' with no arguments lists all jobs with their ids.
>  
>

Our users use a GUI interface to submit jobs from Spartan '04. It does 
not return the jobid.

> 
>  
>
>>>Since your master node isn't running pbs_mom, this implies that the
>>>problem is in your job script.  Is your job script using $PBS_NODEFILE
>>>to spawn the processes?
>>>
>>>
>>>
>>>      
>>>
>>Where do I find the job script?
>>I did a 'env' and there is no "$PBS_NODEFILE"
>>    
>>
>
>Inside of the job environment, the job will have the list of nodes
>assigned to the job in the file named in $PBS_NODEFILE.
>  
>

How do I get to this job environment?

In my reading on this, a doc. said to run "echo "sleep 30" | qsub" to 
give me a second job.
It returns "qsub: Bad UID for job execution".

>For example, if you launching an MPI program with mpirun, then you would
>pass the nodes with something like:
>
>  np=`wc -l < $PBS_NODEFILE`
>  mpirun -machinefile $PBS_NODEFILE -np $np ./command
>  
>

Does Linux come with a MPI program I can run or do I d/l 1 or make 1 or 
what?
Sorry, I'm really new to this whole clustering business.
I do know Linux fairly well though.

>_______________________________________________
>torqueusers mailing list
>torqueusers at supercluster.org
>http://www.supercluster.org/mailman/listinfo/torqueusers
>
>  
>


More information about the torqueusers mailing list