[torqueusers] resources problem

Garrick Staples garrick at clusterresources.com
Thu Nov 30 16:50:50 MST 2006


On Thu, Nov 30, 2006 at 11:59:22PM +0100, Bernd Schubert alleged:
> On Wednesday 29 November 2006 20:52, Bernd Schubert wrote:
> > I have absolutely no experience with MPI, could a program compiled with
> > mpicc call itself mpirun? The attached script submitted to qsub calls an
> > executable "transport". This is no script, but the binary compiled with
> > mpicc. Running "strings" on this binary I see that it has several times a
> > string "MPIRUN" and one time the sentence "! Could not create p4 procgroup.
> > Possible missing file or program started without mpirun."
> 
> I just checked with strace whats going on and and I only see some strange 
> socketcall() calls. But somehow this is probably the reason why torque is not 
> assigning the resources to the started job.
> 
> Garrick, can you just give me a hint where to search in the sources? I mean 
> your dumpmon shows that pbs_mom in principle properly gets the resources used 
> by the jobs started on the node. Only the later on the resources are not 
> properly assigned to the job, but which program does this assignment, pbs_mon 
> or pbs_server? 
> I hope I find some time during the weekend to look into the torque sources, 
> but of course it will be much easier if I would know those basics in advance.

Is this a single or multi node job?  Each node adds up the resources
used on that node.  Sister nodes send that info to MS (the exec host).
MS sends the sums to pbs_server.

You are on 2.1.6, right?



More information about the torqueusers mailing list