[torqueusers] mpiexec jobs got stuck

Steve Young chemadm at hamilton.edu
Wed May 13 11:59:34 MDT 2009


Are you able to test just mpi and your application? What I mean is  
does this problem exist if you don't use the queue system and manually  
make your mpd ring and run it on some hosts? This could at least help  
verify if it is/isn't a torque problem. Hope this helps,

-Steve

On May 13, 2009, at 12:22 PM, Abhishek Gupta wrote:

> Hi Troy,
> I was able to fix the error message I mailed in my last mail, but  
> the problem I explained in the beginning still exist, i.e. Job runs  
> for a while and then stuck forever. Like I said it runs fine till  
> node value=20 but beyond that it shows such behavior.
> Is there anything else I can try?
> Thanks,
> Abhi.
>
>
> Troy Baer wrote:
>>
>> On Tue, 2009-05-12 at 17:03 -0400, Abhishek Gupta wrote:
>>
>>> It is giving me an error:
>>> mpiexec: Error: get_hosts: pbs_statjob returned neither "ncpus"  
>>> nor "nodect"
>>>
>>> Any suggestion?
>>>
>> What does your job script look like?  How are you requesting nodes
>> and/or processors?
>>
>> 	--Troy
>>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list