[torqueusers] Jobs stuck in Queue

Joshua Bernstein jbernstein at penguincomputing.com
Fri Oct 5 15:53:12 MDT 2007


Garrick,

As always thank you for your input. Its always nice to be able to bounce 
things off others.

> On Thu, Oct 04, 2007 at 12:51:07PM -0700, Joshua Bernstein alleged:
>> Hello All,
>>
>> I'm having a problem handling running MPI based jobs linked against a 
>> MPICH under TORQUE
>>
>> The problem is this, in my jobs script, I try to start an MPI job in the 
>> same why I would outside TORQUE:
>>
>> ---
>> #PBS -j oe
>> <code to set BEOWULF_JOB_MAP based on PBS_NODEFILE>
>> exec ./mpijob
>> ---
> 
> exec?  Why replace the top-level shell process?

Well an exec basically what ends up happening when you use our mpirun 
with a Bproc based MPICH. Our MPICH is built as follows:

   ./configure --with-arch=LINUX \
               --prefix=/usr \
               --enable-debug \
               --with-comm=bproc  \
               --with-romio=--with-mpi=mpich -lpthread" 	
               --with-device=%{device} \

>> This of course correctly starts the jobs on the nodes, but if I do a 
>> qdel, to kill the job, the job leaves the TORQUE queue, but the 
>> processes still stay on the nodes. This behavior has lead me to use mpiexec.
> 
> At least the processes on the MS node are killed, right?

Yes, one of the processes gets killed, one running on mother superior

>> So, if I use mpiexec a la:
>>
>> ---
>> #PBS -j oe
>> <code to set BEOWULF_JOB_MAP based on PBS_NODEFILE>
>> mpiexec -comm none ./mpijob
>> ---
> 
> comm none?  That's only for non-MPI programs.

If I use comm p4, the job goes into the running state but never actually 
starts up. When I do a qdel, it does actually cleaned up and removed 
from the queue though

>> The jobs again, start properly on the nodes (albeit a bit slower), and 
>> then when I do a qdel, the processes get properly cleaned off the nodes. 
>> The trouble here is that the job still shows up in the TORQUE queue 
>> marked as running. The only way to clean up this job is to remove its 
>> entries from $PBS_HOME/server_priv/job and from $PBS_HOME/mom_priv/jobs
> 
> First, manually deleting files is bad.  If you really must purge jobs use
> 'momctl -c' to clear it from the node, and 'qdel -p' to clear it from the
> server.  That said, never use those commands!

Understood. Sometimes, you gotta do what you gotta do though eh?

> If you look in pbs_mom's log file, you'll probably find an error message
> related to not being able to talk to the server.

I see no errors in the pbs_mom file about talking to the server. 
Interestingly, everything all works correctly for non-mpi jobs as well 
as for MPI jobs as described above when run in Interactive mode.

A bit more information. After a qdel, the processes do get cleaned up of 
the nodes. A qstat though shows the processes marked in the "R" state. 
Where as a momctl -d 3, on the nodes shows the processes in the 
"EXITING" state. This would lead me to suggest that pbs_mom isn't for 
some reason realizing all the processes have exiting and left the 
processes space. Of further interest, wouldn't the pbs_mom signal to 
pbs_server that the job is in the "E" (exiting state).

One might say that there is a communication problem between the server 
and mom, but why then would it work for interactive and non-mpi jobs?

-Joshua Bernstein
Software Engineer
Penguin Computing


More information about the torqueusers mailing list