[torqueusers] queues are not working after master node reboot
sm4082 at nyu.edu
Wed Mar 16 11:08:31 MDT 2011
Sorry. I should have been clear with my problem in the email. Jobs are starting, but on the wrong nodes. I explained the whole thing in my last email. Other that that I don't see any error messages, etc. Since jobs are ending up on wrong nodes, some times jobs are killed after a while as big memory jobs need more memory but they end up on less memory nodes.
With the queue settings we have it should work fine as it was working very fine until we rebooted the master node. I have tried everything else with any success.
On Mar 16, 2011, at 12:22 PM, Lloyd Brown wrote:
> On 3/16/11 7:35 AM, Sreedhar Manchu wrote:
>> Since then the queue settings are not working on our cluster.
> You might need to be a bit more specific. What's not working? Are you
> getting any error messages? Is it just a behavior thing, as in the
> system isn't acting like you think it should? Are jobs being queued,
> and just not started (which might imply a Moab problem), or are jobs not
> being allowed to be queued? Are jobs starting, just on the wrong nodes?
> Lloyd Brown
> Systems Administrator
> Fulton Supercomputing Lab
> Brigham Young University
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers