[torqueusers] resources problem

Bernd Schubert bernd-schubert at gmx.de
Wed Nov 29 12:52:03 MST 2006


Hi Garrick,

thanks for your help.

On Wednesday 29 November 2006 20:06, Garrick Staples wrote:
> On Wed, Nov 29, 2006 at 03:25:34PM +0100, Bernd Schubert alleged:
> > Hi,
> >
> > we just had the problem that the job one of our group members required
> > more resources (memory) than requested, but still torque didn't kill it.
> > Also, qstat reports by far too low resources for this special program.
> > For all other programs presently running the resources reported are fine,
> > only this program is troublesome.
> > While looking whats so special about it,  we see its basically a mpi
> > program compiled with mpicc, however, it is NOT started using mpirun, but
> > on our cluster its just queued as any other program using 'qstat
> > program_name'.
>
> 'qsub' can't submit binaries, so the batch script is probably running
> mpirun.

Ah sorry, I forgot. We have a 'frontend' for qsub which creates those scripts 
itself (among several other things). However, this tcsub script certainly 
doesn't call mpirun. A simple call of tcsub is 

"tcsub prog_name prog_parameters"

I'm so used to it, that I forgot that qsub doesn't accept binaries. I have 
attached the script given to qsub, there's really no mpirun call.

> If your mpirun is using rsh/ssh to spawn the remote processes, then
> they won't be tracked and added to the job's usage.

There's only one process on the node where the program is started. 

I have absolutely no experience with MPI, could a program compiled with mpicc 
call itself mpirun? The attached script submitted to qsub calls an 
executable "transport". This is no script, but the binary compiled with 
mpicc. Running "strings" on this binary I see that it has several times a 
string "MPIRUN" and one time the sentence "! Could not create p4 procgroup.  
Possible missing file or program started without mpirun." 

However, ps ax doesn't show any rsh/ssh processes, only this binary.

Thanks,
Bernd

-- 
Bernd Schubert
PCI / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 21880.hitch.SC
Type: application/x-shellscript
Size: 1888 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20061129/98a3f829/21880.hitch.bin


More information about the torqueusers mailing list