[torqueusers] resources problem

Bernd Schubert bernd-schubert at gmx.de
Thu Nov 30 17:31:12 MST 2006


On Friday 01 December 2006 00:50, Garrick Staples wrote:
> On Thu, Nov 30, 2006 at 11:59:22PM +0100, Bernd Schubert alleged:
> > On Wednesday 29 November 2006 20:52, Bernd Schubert wrote:
> > > I have absolutely no experience with MPI, could a program compiled with
> > > mpicc call itself mpirun? The attached script submitted to qsub calls
> > > an executable "transport". This is no script, but the binary compiled
> > > with mpicc. Running "strings" on this binary I see that it has several
> > > times a string "MPIRUN" and one time the sentence "! Could not create
> > > p4 procgroup. Possible missing file or program started without mpirun."
> >
> > I just checked with strace whats going on and and I only see some strange
> > socketcall() calls. But somehow this is probably the reason why torque is
> > not assigning the resources to the started job.
> >
> > Garrick, can you just give me a hint where to search in the sources? I
> > mean your dumpmon shows that pbs_mom in principle properly gets the
> > resources used by the jobs started on the node. Only the later on the
> > resources are not properly assigned to the job, but which program does
> > this assignment, pbs_mon or pbs_server?
> > I hope I find some time during the weekend to look into the torque
> > sources, but of course it will be much easier if I would know those
> > basics in advance.
>
> Is this a single or multi node job?  Each node adds up the resources
> used on that node.  Sister nodes send that info to MS (the exec host).
> MS sends the sums to pbs_server.

Its a single node job. With the exception that the binary was compiled using 
mpicc and the source has several MPI calls, there's no further MPI done for 
that jobs. Its started as any other non-MPI binary. 

>
> You are on 2.1.6, right?

Thats the weak point, still at 2.1.0p0, I think in the changelog to 2.1.6 
might be only this entry important

   b - fix 2.1.4 regression with TM on single-node jobs

but I have no idea whats the meaning of the abbreviation TM.


Thanks,
Bernd

-- 
Bernd Schubert
PCI / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg



More information about the torqueusers mailing list