[torqueusers] queues are not working after master node reboot

Sreedhar Manchu sm4082 at nyu.edu
Wed Mar 16 14:22:04 MDT 2011


Hi Jerry,

We seem to find the culprit behind this problem. When we rebooted the master node, it started the pbs_sched at boot time. We disabled it and now everything seems to be working fine. For now I think the problem has been resolved. If we still face problems tomorrow I will let you know here. Thanks again for all the help. I do appreciate it.

I will let you know how it goes later with the new jobs. Hopefully it should work well.

Thanks,
Sreedhar.


On Mar 16, 2011, at 12:22 PM, Jerry Smith wrote:

> Sreedhar,
> 
> Can you define "queue settings are not working"?
> 
> Are jobs not starting?  Are the queues no longer visible? Are they showing the wrong nodes?
> 
> A little more detail and we can probably get you to resolution faster.
> 
> Jerry
> 
> Sreedhar Manchu wrote:
>> 
>> Hi Steve,
>> 
>> First, thank you for writing. We have just 6 queues. Could you please clarify on modifying include files? If I can resolve it without having to rebuild torque it would be great. If that is the only solution, then I guess I will have to.
>> 
>> Thanks once again. I look forward to your reply.
>> 
>> Regards,
>> Sreedhar.
>> 
>> On Mar 16, 2011, at 12:12 PM, Steve Crusan wrote:
>> 
>>   
>>> On 3/16/11 9:35 AM, "Sreedhar Manchu" <sm4082 at nyu.edu> wrote:
>>> 
>>>     
>>>> Hello Everyone,
>>>> 
>>>> My name is Sreedhar. I am new to this mailing list. I have a quick question on
>>>> queues. I would really appreciate it if some one could help me with it. Very
>>>> recently, we rebooted the master node. Since then the queue settings are not
>>>> working on our cluster. It used to be fine until the reboot. We haven't
>>>> changed anything in settings. Moab is the scheduler. I have tried to restart
>>>> both pbs and moab and still jobs end up in the wrong queue.
>>>>       
>>> How many queues do you have? We had similar problems on our dev cluster, one
>>> that we had more than 16 queues, and ended up having to modify some include
>>> files + rebuild torque.
>>> 
>>> 
>>>     
>>>> I have looked into documentation but didn't find anything related to this type
>>>> of problem. I would really appreciate if if some one could help me.
>>>> 
>>>> Thanks,
>>>> Sreedhar.
>>>> _______________________________________________
>>>> torqueusers mailing list
>>>> torqueusers at supercluster.org
>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>       
>>> 
>>> ----------------------
>>> Steve Crusan
>>> System Administrator
>>> Center for Research Computing
>>> University of Rochester
>>> https://www.crc.rochester.edu/
>>> 
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>     
>> 
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>> 
>> 
>>   
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20110316/734e3f7b/attachment.html 


More information about the torqueusers mailing list