[torqueusers] mpiexec jobs got stuck
chemadm at hamilton.edu
Wed May 13 11:59:34 MDT 2009
Are you able to test just mpi and your application? What I mean is
does this problem exist if you don't use the queue system and manually
make your mpd ring and run it on some hosts? This could at least help
verify if it is/isn't a torque problem. Hope this helps,
On May 13, 2009, at 12:22 PM, Abhishek Gupta wrote:
> Hi Troy,
> I was able to fix the error message I mailed in my last mail, but
> the problem I explained in the beginning still exist, i.e. Job runs
> for a while and then stuck forever. Like I said it runs fine till
> node value=20 but beyond that it shows such behavior.
> Is there anything else I can try?
> Troy Baer wrote:
>> On Tue, 2009-05-12 at 17:03 -0400, Abhishek Gupta wrote:
>>> It is giving me an error:
>>> mpiexec: Error: get_hosts: pbs_statjob returned neither "ncpus"
>>> nor "nodect"
>>> Any suggestion?
>> What does your job script look like? How are you requesting nodes
>> and/or processors?
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers