[torqueusers] momctl error - A description
garrick at clusterresources.com
Tue Mar 6 16:26:08 MST 2007
On Tue, Mar 06, 2007 at 06:14:18PM -0500, michael young alleged:
> The thing that bothers me is that there is a job running currently.
> Should it show up with 'qstat'?
Yes, if there is a job running then it shows up in 'qstat'. If it
doesn't, then there isn't a job.
> >The jobid is first printed to the user when running qsub, and running
> >'qstat' with no arguments lists all jobs with their ids.
> Our users use a GUI interface to submit jobs from Spartan '04. It does
> not return the jobid.
Then I suspect it is just directly running processes and not actually
talking to TORQUE.
Do you see any activity in pbs_server's logfile indicating job
You might also be doing this backwards. Try running an interactive job
first, and then running Spartan from within the job.
> >>>Since your master node isn't running pbs_mom, this implies that the
> >>>problem is in your job script. Is your job script using $PBS_NODEFILE
> >>>to spawn the processes?
> >>Where do I find the job script?
> >>I did a 'env' and there is no "$PBS_NODEFILE"
> >Inside of the job environment, the job will have the list of nodes
> >assigned to the job in the file named in $PBS_NODEFILE.
Jobs consists of a user-written job script that does what the user
wants and is submitted to TORQUE with qsub. TORQUE will then send the
job to a node where the job script is executed.
> How do I get to this job environment?
> In my reading on this, a doc. said to run "echo "sleep 30" | qsub" to
> give me a second job.
> It returns "qsub: Bad UID for job execution".
Are you running qsub on the same host that is running pbs_server, or a
different host? You aren't running it as root, right?
> >For example, if you launching an MPI program with mpirun, then you would
> >pass the nodes with something like:
> > np=`wc -l < $PBS_NODEFILE`
> > mpirun -machinefile $PBS_NODEFILE -np $np ./command
> Does Linux come with a MPI program I can run or do I d/l 1 or make 1 or
> Sorry, I'm really new to this whole clustering business.
> I do know Linux fairly well though.
If you aren't currently running MPI programs, then there is no point in
getting an MPI implementation. I was just using it as an example of a
common type of job that could be run within PBS.
More information about the torqueusers