[torqueusers] PBS job issue

Steve Young chemadm at hamilton.edu
Fri Jan 16 09:31:31 MST 2009


Hi,
	I'm wondering if this is an MPI type of job? Did you make sure to  
compile MPI to be TM-aware? How do you know the job is not actually  
running somewhere? I've found that if you don't make MPI aware of  
torque then the jobs end up on nodes MPI assigns and doesn't run on  
the nodes torque assigns. I ended up using OSC's version of mpiexec  
but using a version of MPI that can be compiled to be TM aware would  
do the same thing. This is just a guess without knowing what kind of  
job your running, what version of torque you have, how you have things  
configured and such. Hope this helps,

-Steve


On Jan 16, 2009, at 11:13 AM, Abhishek Gupta wrote:

> Hi all,
> I am facing a problem with job submission in which my first job gets  
> stuck for ever( showing R state ) and if I run the same job keeping  
> the first job, second job runs without any problem. I found that  
> when I ask for more than 1 node, then only this problem arises. Even  
> if I say nodes=1:ppn=2, it runs without any problem, but nodes=2 do  
> not work for the first time. There is one thing that I found, even  
> some other job( which require more than one node is stuck started by  
> some other user), my job with requirement more than one node run  
> smoothly while the job of that other user stays in that state forever.
> Could someone tell what could be the issue? Is there any parameter  
> that need to be set?
> Thanks,
> Abhishek.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list