[torqueusers] Jobs stuck in Queue
Joshua Bernstein
jbernstein at penguincomputing.com
Fri Oct 5 15:53:12 MDT 2007
Garrick,
As always thank you for your input. Its always nice to be able to bounce
things off others.
> On Thu, Oct 04, 2007 at 12:51:07PM -0700, Joshua Bernstein alleged:
>> Hello All,
>>
>> I'm having a problem handling running MPI based jobs linked against a
>> MPICH under TORQUE
>>
>> The problem is this, in my jobs script, I try to start an MPI job in the
>> same why I would outside TORQUE:
>>
>> ---
>> #PBS -j oe
>> <code to set BEOWULF_JOB_MAP based on PBS_NODEFILE>
>> exec ./mpijob
>> ---
>
> exec? Why replace the top-level shell process?
Well an exec basically what ends up happening when you use our mpirun
with a Bproc based MPICH. Our MPICH is built as follows:
./configure --with-arch=LINUX \
--prefix=/usr \
--enable-debug \
--with-comm=bproc \
--with-romio=--with-mpi=mpich -lpthread"
--with-device=%{device} \
>> This of course correctly starts the jobs on the nodes, but if I do a
>> qdel, to kill the job, the job leaves the TORQUE queue, but the
>> processes still stay on the nodes. This behavior has lead me to use mpiexec.
>
> At least the processes on the MS node are killed, right?
Yes, one of the processes gets killed, one running on mother superior
>> So, if I use mpiexec a la:
>>
>> ---
>> #PBS -j oe
>> <code to set BEOWULF_JOB_MAP based on PBS_NODEFILE>
>> mpiexec -comm none ./mpijob
>> ---
>
> comm none? That's only for non-MPI programs.
If I use comm p4, the job goes into the running state but never actually
starts up. When I do a qdel, it does actually cleaned up and removed
from the queue though
>> The jobs again, start properly on the nodes (albeit a bit slower), and
>> then when I do a qdel, the processes get properly cleaned off the nodes.
>> The trouble here is that the job still shows up in the TORQUE queue
>> marked as running. The only way to clean up this job is to remove its
>> entries from $PBS_HOME/server_priv/job and from $PBS_HOME/mom_priv/jobs
>
> First, manually deleting files is bad. If you really must purge jobs use
> 'momctl -c' to clear it from the node, and 'qdel -p' to clear it from the
> server. That said, never use those commands!
Understood. Sometimes, you gotta do what you gotta do though eh?
> If you look in pbs_mom's log file, you'll probably find an error message
> related to not being able to talk to the server.
I see no errors in the pbs_mom file about talking to the server.
Interestingly, everything all works correctly for non-mpi jobs as well
as for MPI jobs as described above when run in Interactive mode.
A bit more information. After a qdel, the processes do get cleaned up of
the nodes. A qstat though shows the processes marked in the "R" state.
Where as a momctl -d 3, on the nodes shows the processes in the
"EXITING" state. This would lead me to suggest that pbs_mom isn't for
some reason realizing all the processes have exiting and left the
processes space. Of further interest, wouldn't the pbs_mom signal to
pbs_server that the job is in the "E" (exiting state).
One might say that there is a communication problem between the server
and mom, but why then would it work for interactive and non-mpi jobs?
-Joshua Bernstein
Software Engineer
Penguin Computing
More information about the torqueusers
mailing list