[torqueusers] Submitting jobs to multi-cpu nodes

Adam Carheden carheden at cira.colostate.edu
Mon Feb 13 12:01:01 MST 2006


Ronny,

Thanks for the reply, but I think I'm still misunderstanding something. 
Per the thread you referred me to, I'm using '-l ncpus' instead of '-l 
nodes ', which lets me run my 16-node MPI job with mpiexec. However, 
pbsnodes and top show that all 16 processes seem to be running on the 
first node.

The process also dies with the error message "mpiexec: Warning: tasks 
0-2,5-8,13-14 died with signal 9 (Killed)" on stderr. The program should 
exit with a different error if it doesn't have enough nodes though, so I 
imagine that mpiexec is getting the requested number of nodes from torque.

Any hints on where I can configure or monitor how torque allocates nodes 
and processors to jobs?

Thanks
-- 
Adam Carheden
Linux Systems Administrator


Ronny T. Lampert wrote:
> Hi,
> 
> 
>>or 2 jobs that require 8 nodes and they will all run in parallel. When I
>>submit a 16-node job, however, I get the error message:
>>
>>qsub: Job exceeds queue resource limits
> 
> 
> 
> This is what you may need - I do not use it for myself, but you may give it
> a try
> (thread was: "Re: [torqueusers] Online docs missing queue resource "nodes" "):
> -----
> 
> You can override this as I bugged David et. al about this for ages and they
> gave in to make me go away. :-)
> 
> So for our 144 CPU Power5 cluster (36 x 4 CPU boxes) I have:
> 
>   set server resources_available.nodect = 144
> 
> 
> -----
> 
> 
> Cheers,
> Ronny
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list