[torqueusers] Problems with Torque configuration

Davi Vercillo davivercillo.aux at gmail.com
Thu Nov 29 10:39:42 MST 2007


Hi Garrick,

Tanks for your halep and sorry about this too. I got the logs that you said
and I'll send to you in this e-mail. I sow the logs of my nodes and them are
the same, with teh same error message. So, here they go...

Nodes - bangu07:
[root at bangu08 ~]# cat /var/spool/torque/mom_logs/20071123
11/23/2007 07:57:58;0002;   pbs_mom;Svr;Log;Log opened
11/23/2007 07:57:58;0002;   pbs_mom;n/a;initialize;independent
11/23/2007 07:57:58;0002;   pbs_mom;Svr;pbs_mom;Is up
11/23/2007 07:57:58;0002;   pbs_mom;Svr;mom_main;MOM executable path and
mtime at launch: /usr/local/sbin/pbs_mom 1193074779
11/23/2007 07:57:58;0002;   pbs_mom;n/a;mom_main;hello sent to server
bangu00
11/23/2007 07:59:28;0002;   pbs_mom;n/a;mom_main;connection to server
bangu00 timeout
11/23/2007 07:59:28;0002;   pbs_mom;n/a;mom_main;hello sent to server
bangu00
11/23/2007 08:00:58;0002;   pbs_mom;n/a;mom_main;connection to server
bangu00 timeout
11/23/2007 08:00:58;0002;   pbs_mom;n/a;mom_main;hello sent to server
bangu00
11/23/2007 08:02:28;0002;   pbs_mom;n/a;mom_main;connection to server
bangu00 timeout
11/23/2007 08:02:28;0002;   pbs_mom;n/a;mom_main;hello sent to server
bangu00
11/23/2007 08:03:58;0002;   pbs_mom;n/a;mom_main;connection to server
bangu00 timeout
11/23/2007 08:03:58;0002;   pbs_mom;n/a;mom_main;hello sent to server
bangu00
11/23/2007 08:05:28;0002;   pbs_mom;n/a;mom_main;connection to server
bangu00 timeout
(...) <- The same message many times...
11/23/2007 11:49:08;0002;   pbs_mom;n/a;mom_main;connection to server
bangu00 timeout
11/23/2007 11:49:08;0002;   pbs_mom;n/a;mom_main;hello sent to server
bangu00
11/23/2007 11:50:38;0002;   pbs_mom;n/a;mom_main;connection to server
bangu00 timeout
11/23/2007 11:50:38;0002;   pbs_mom;n/a;mom_main;hello sent to server
bangu00
11/23/2007 11:52:08;0002;   pbs_mom;n/a;mom_main;connection to server
bangu00 timeout
11/23/2007 11:52:08;0002;   pbs_mom;n/a;mom_main;hello sent to server
bangu00
11/23/2007 11:53:38;0002;   pbs_mom;n/a;mom_main;connection to server
bangu00 timeout
11/23/2007 11:53:38;0002;   pbs_mom;n/a;mom_main;hello sent to server
bangu00
11/23/2007 11:55:07;0002;   pbs_mom;Svr;pbs_mom;caught signal 15: leaving
jobs running, just exiting
11/23/2007 15:40:21;0002;   pbs_mom;Svr;Log;Log opened
11/23/2007 15:40:21;0002;   pbs_mom;n/a;initialize;independent
11/23/2007 15:40:21;0002;   pbs_mom;Svr;pbs_mom;Is up
11/23/2007 15:40:21;0002;   pbs_mom;Svr;mom_main;MOM executable path and
mtime at launch: /usr/local/sbin/pbs_mom 1193074779
11/23/2007 15:40:21;0002;   pbs_mom;n/a;mom_main;hello sent to server
bangu00
11/23/2007 15:58:21;0002;   pbs_mom;Svr;im_eof;End of File from addr
146.164.41.100:15001
11/23/2007 15:58:21;0002;   pbs_mom;n/a;mom_main;hello sent to server
bangu00
11/23/2007 15:59:55;0002;   pbs_mom;n/a;mom_main;connection to server
bangu00 timeout
11/23/2007 15:59:55;0002;   pbs_mom;n/a;mom_main;hello sent to server
bangu00
11/23/2007 16:48:36;0002;   pbs_mom;Svr;pbs_mom;caught signal 15: leaving
jobs running, just exiting

The syslog I didn't find here, where would i look for !?

Other question: Do I need run the pbs_mom on the server with pbs_server and
pbs_mom !?

Tanks.

2007/11/26, Garrick Staples <garrick at usc.edu>:
>
> On Sat, Nov 24, 2007 at 04:01:44PM -0200, Davi Vercillo alleged:
> > Hi all,
> >
> >
> > 2007/11/24, Garrick Staples <garrick at usc.edu>:
> > >
> > > On Fri, Nov 23, 2007 at 04:42:51PM -0200, Davi Vercillo alleged:
> > > > 11/23/2007 16:14:37  S    Job Modified at request of
> > > >                           Scheduler at bangu00.dcc.ufrj.br
> > > > 11/23/2007 16:14:37  S    Job Run at request of
> > > > Scheduler at bangu00.dcc.ufrj.br
> > > > 11/23/2007 16:14:39  S    unable to run job, MOM rejected/rc=2
> > >
> > > Your server config is fine. The problem is on the node.  The error
> message
> > > will
> > > be in the mom log, syslog on the node, or sent to the job owner by
> email.
> >
> >
> > What do i need configure on the nodes to be correctly ? I did what the
> Wiki
> > page sad to do. Do I need insert others parameters !? What do you think
> that
> > is the problems ?
> >
> > PS: Sorry about my English. =S
>
> I don't know.  You need to check for error messages in the mom log, syslog
> on
> the node, or sent to the job owner by email.
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20071129/af38079e/attachment.html


More information about the torqueusers mailing list