[torqueusers] Fwd: Job execution problem

Vahe nr vner75 at gmail.com
Wed Feb 15 09:49:19 MST 2012


---------- Forwarded message ----------
From: Michael Arndt <m.arndt at science-computing.de>
Date: Wed, Feb 15, 2012 at 8:44 PM
Subject: Re: [torqueusers] Job execution problem
To: Vahe nr <vner75 at gmail.com>


Hi Vahe

you need to resend you last mail separately to the list
by pressing reply you answered via PM to me,
since i answered you intentionally of list.

As far as your messages sugest this is not a problem related
to your job only

The messages suggest that pbs_mom the demon that runs your
job on the exec node does not talk "well" with the PBS Servers
master processes

The best way to resolve the problems is:

read mom logs on the exec node
read sched and server logs on the master
check the connection with pbs_iff

It is an PBS config issue not an job problem
Micha

On Wed, Feb 15, 2012 at 08:31:34PM +0400, Vahe nr wrote:
> Dear all
> The job is always remains on Q state, when I am trying to run it with qrun
> command I am getting the following error:
> qrun: Execution server rejected request MSG=cannot send job to mom,
> state=PRERUN 220.ce.seua-cluster.grid.am
>
> Cheers
>
> On Wed, Feb 15, 2012 at 8:03 PM, Vahe nr <vner75 at gmail.com> wrote:
>
> > Hi Michael
> > The PBS has the same version on node and master, and the host name is
> > right. I will try to use pbs_iff and let see what I will explore!
> >
> > Cheers
> >
> > On Wed, Feb 15, 2012 at 7:54 PM, Vahe nr <vner75 at gmail.com> wrote:
> >
> >> Hi Michael
> >> *Thanks for your replay, I will check what you have suggested and let
> >> you know, I hope it will help.*
> >> *
> >> *
> >> *Cheers*
> >>
> >> On Wed, Feb 15, 2012 at 7:04 PM, Michael Arndt <
> >> m.arndt at science-computing.de> wrote:
> >>
> >>> Hello Vahe,
> >>>
> >>> offlist:
> >>>
> >>> -checks:
> >>>
> >>> -is the PBS Version really the same on the nodes / exec hosts like for
> >>> the pbs master
> >>>
> >>> -is the name shown for a node with pnsnodes node the same that is
> >>>  shown by an ssh node from your pbsmaster
> >>>  ( in other words: is name resolution DNS / NIS / hosts whatever the
same
> >>>   when the PBS Master ask like what the node believes for hostnames of
> >>> itself and
> >>>   master ?
> >>>
> >>>
> >>> -google for pbs_iff
> >>>  The Hits will show you how to use pbs_iff to test the connectivity
from
> >>>  node to master
> >>>
> >>> last but not least the PBS Sched_logs on Server and Mom Logs on exec
host
> >>> will show info aboit the problem
> >>>
> >>>
> >>> Micha
> >>>
> >>> --
> >>> Vorstand/Board of Management:
> >>> Dr. Bernd Finkbeiner, Michael Heinrichs,
> >>> Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech
> >>> Vorsitzender des Aufsichtsrats/
> >>> Chairman of the Supervisory Board:
> >>> Philippe Miltin
> >>> Sitz/Registered Office: Tuebingen
> >>> Registergericht/Registration Court: Stuttgart
> >>> Registernummer/Commercial Register No.: HRB 382196
> >>>
> >>>
> >>
> >
--
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Michael Heinrichs,
Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120215/31a19a4d/attachment-0001.html 


More information about the torqueusers mailing list