[torqueusers] queues are not working after master node reboot
Sreedhar Manchu
sm4082 at nyu.edu
Wed Mar 16 14:22:04 MDT 2011
Hi Jerry,
We seem to find the culprit behind this problem. When we rebooted the master node, it started the pbs_sched at boot time. We disabled it and now everything seems to be working fine. For now I think the problem has been resolved. If we still face problems tomorrow I will let you know here. Thanks again for all the help. I do appreciate it.
I will let you know how it goes later with the new jobs. Hopefully it should work well.
Thanks,
Sreedhar.
On Mar 16, 2011, at 12:22 PM, Jerry Smith wrote:
> Sreedhar,
>
> Can you define "queue settings are not working"?
>
> Are jobs not starting? Are the queues no longer visible? Are they showing the wrong nodes?
>
> A little more detail and we can probably get you to resolution faster.
>
> Jerry
>
> Sreedhar Manchu wrote:
>>
>> Hi Steve,
>>
>> First, thank you for writing. We have just 6 queues. Could you please clarify on modifying include files? If I can resolve it without having to rebuild torque it would be great. If that is the only solution, then I guess I will have to.
>>
>> Thanks once again. I look forward to your reply.
>>
>> Regards,
>> Sreedhar.
>>
>> On Mar 16, 2011, at 12:12 PM, Steve Crusan wrote:
>>
>>
>>> On 3/16/11 9:35 AM, "Sreedhar Manchu" <sm4082 at nyu.edu> wrote:
>>>
>>>
>>>> Hello Everyone,
>>>>
>>>> My name is Sreedhar. I am new to this mailing list. I have a quick question on
>>>> queues. I would really appreciate it if some one could help me with it. Very
>>>> recently, we rebooted the master node. Since then the queue settings are not
>>>> working on our cluster. It used to be fine until the reboot. We haven't
>>>> changed anything in settings. Moab is the scheduler. I have tried to restart
>>>> both pbs and moab and still jobs end up in the wrong queue.
>>>>
>>> How many queues do you have? We had similar problems on our dev cluster, one
>>> that we had more than 16 queues, and ended up having to modify some include
>>> files + rebuild torque.
>>>
>>>
>>>
>>>> I have looked into documentation but didn't find anything related to this type
>>>> of problem. I would really appreciate if if some one could help me.
>>>>
>>>> Thanks,
>>>> Sreedhar.
>>>> _______________________________________________
>>>> torqueusers mailing list
>>>> torqueusers at supercluster.org
>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>
>>>
>>> ----------------------
>>> Steve Crusan
>>> System Administrator
>>> Center for Research Computing
>>> University of Rochester
>>> https://www.crc.rochester.edu/
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
>>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20110316/734e3f7b/attachment.html
More information about the torqueusers
mailing list