[torqueusers] recovery behavior question

John Wang jwang at dataseekonline.com
Thu Feb 14 16:07:00 MST 2008


Garrick

It's a brand new cluster, with a fresh CentOS 4.5 distro without the
distro's PBS, OFED or MPI stuff.   The torque 2.2.1 source was downloaded
from Cluster Resources, the OFED hence OpenMPI from OpenFabrics, compiled
with the gcc compiler though we did try out the OFED with both gcc and PGI.
How can there be old binaries?   It's clearly the executable that was
compiled in the path.

Regards,
John


On 2/14/08 1:51 PM, "Garrick Staples" <garrick at usc.edu> wrote:

> On Thu, Feb 14, 2008 at 12:12:46PM -0600, John Wang alleged:
>> Hello Tim
>> 
>> So you're stopping the pbs_mom daemon on the compute nodes to prevent jobs
>> from running on them?
> 
> That's not what he said at all.
> 
>  
>> That had been the practice here as well.   It just seems to me that we
>> shouldn't have to use such work arounds.
> 
> Something is weird with your install, probably old binaries hanging around.
> Please don't project this on to everyone else's systems.
> 
>  
> 
>> On 2/13/08 8:48 PM, "Tim Freeman" <tfreeman at mcs.anl.gov> wrote:
>> 
>>> If I submit a job with all moms in the pool in the 'down' state, the job
>>> sits
>>> in the queue as expected.  Then I bring up a node but the job in the queue
>>> is
>>> not run until I submit another job (and they both do run).
>>> 
>>> Is this expected?  Is there a setting I am missing to get around this?
>>> 
>>> (Torque 2.2.1, Maui 3.2.6p19 but I saw this with pbs_sched too)
>>> 
>>> Thankyou,
>>> Tim
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>> 
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list