[Mauiusers] cannot accept client on PBS sched socket

Aquarijen aquarijen at gmail.com
Fri Nov 11 13:55:58 MST 2005


Actually, that helps a lot. Thanks for the leads - I'll look into upgrading.

-Jen

On 11/11/05, Garrick Staples <garrick at usc.edu> wrote:
> On Fri, Nov 11, 2005 at 12:10:54PM -0500, Aquarijen alleged:
> > Hi all,
> >
> > I'm a newbie maui admin.  Maui seemed to be working and scheduling
> > jobs according to my maui.cfg until this morning when I tried to
> > restart it when I made a change to the config.  Obviously I have
> > reverted to the known working config, but I still see these errors.
> > The funny thing is that jobs are still being scheduled.  Any pointers
> > as to where to look for more info on this would be greatly appreciated
> > or if I have missed something in the manual, please let me know.
> > Also, I am unsure of what other info might be helpful, so if I haven't
> > included enough info, please let me know that too. :)
> >
> > I have restarted the compute nodes and the problem continues, btw.
> >
> > Maui Scheduler version 3.2.5p2 (it came with oscar).  Both pfilter and
> > iptables are off on all nodes.
>
> What version of TORQUE?
>
> I think 3.2.5p2 is older than you really want to use with TORQUE.  I think it
> was around the era of 3.2.5p10 when TORQUE/Maui got parallel
> advancements that bypass the need for maui to poll the MOMs.
>
>
> > pbsnodes -a shows all nodes are free.
> >
> > pbs_server logs just show:
> > 11/11/2005 11:58:19;0040;PBS_Server;Svr;b01l02;Scheduler sent command time
> > 11/11/2005 11:59:19;0040;PBS_Server;Svr;b01l02;Scheduler sent command time
> > 11/11/2005 12:00:19;0040;PBS_Server;Svr;b01l02;Scheduler sent command time
> >
> > pbs_mom log on the first node shows:
> > 11/11/2005 11:11:20;0002;   pbs_mom;Svr;Log;Log opened
> > 11/11/2005 11:11:20;0002;   pbs_mom;Svr;usecp;b01l02:/home /home
> > 11/11/2005 11:11:20;0002;   pbs_mom;Svr;restricted;b01l02
> > 11/11/2005 11:11:20;0002;   pbs_mom;Svr;timeout;60
> > 11/11/2005 11:11:20;0080;   pbs_mom;n/a;add_static;config[0] add name
> > timeout value 60
> > 11/11/2005 11:11:20;0002;   pbs_mom;n/a;initialize;independent
> > 11/11/2005 11:11:20;0002;   pbs_mom;Svr;pbs_mom;Is up
> > 11/11/2005 11:11:20;0002;   pbs_mom;n/a;is_update_stat;hello sent to server
> > 11/11/2005 11:11:20;0001;   pbs_mom;Svr;pbs_mom;im_eof, End of File
> > from addr 192.168.79.251:15001
> > 11/11/2005 11:11:23;0001;   pbs_mom;Svr;pbs_mom;Broken pipe (32) in
> > rm_request, write request response failed: Protocol failure in commit
> >         message refused from port 1022 addr 192.168.79.251
> > 11/11/2005 11:11:43;0002;   pbs_mom;n/a;is_update_stat;hello sent to server
>
> It's very reasonable to get EOF errors immediately after restarting PBS
> daemons.  In this case, pbs_server still had valid connection info for
> the MOM at the time MOM came up.  pbs_server must close it's end of the
> connection and refuse the new connection from MOM.  With both ends now
> in sync, the next connection initiated by either side will succeed.
>
> Modern TORQUEs handle this stuff better.
>
>
> > In the maui logs, I see many of these:
> > 11/11 11:52:54 ALERT:    cannot accept client on PBS sched socket
> > 11/11 11:52:54 MSUAcceptClient(6,ClientSD,HostName,TCP)
> > 11/11 11:52:54 INFO:     accept call failed, errno: 11 (Resource
> > temporarily unavailable)
> > 11/11 11:52:54 INFO:     all clients connected.  servicing requests
> > 11/11 11:52:54 MRMCheckEvents()
> > 11/11 11:52:54 ALERT:    no PBS RPP sched socket connections ready
> > 11/11 11:52:54 MSUAcceptClient(8,ClientSD,HostName,TCP)
> > 11/11 11:52:54 INFO:     accept call failed, errno: 11 (Resource
> > temporarily unavailable)
> > 11/11 11:52:54 ALERT:    cannot accept client on PBS sched socket
> > 11/11 11:52:54 MSUAcceptClient(6,ClientSD,HostName,TCP)
> > 11/11 11:52:54 INFO:     accept call failed, errno: 11 (Resource
> > temporarily unavailable)
> > 11/11 11:52:54 INFO:     all clients connected.  servicing requests
> >
> > Thanks for any information at all.  I am really at a loss here.
>
> I'm not sure either.
>
> --
> Garrick Staples, Linux/HPCC Administrator
> University of Southern California
>
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>
>
>
>


More information about the mauiusers mailing list