[torqueusers] output staying on nodes - pbs_mom problem ?

Clifton Kirby ckirby3 at colsa.com
Mon Sep 19 10:34:25 MDT 2005


Running 1.2.0p5, I am having a similar problem with the job staying in the
"E" state and eventually clearing out reporting the same Post job file
processing error.  However it is intermittent and I am running an Epilogue
script as well but all processes have completed for the job and the Epilogue
script.  Sometimes I get the Standard Out file but the Standard Error is
still in the spool directory on the mother superior.  We never saw this
behavior in 1.2.0p4.

- Cliff

-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org]On Behalf Of Julie Harold
Sent: Monday, September 19, 2005 7:20 AM
To: torqueusers at supercluster.org
Subject: [torqueusers] output staying on nodes - pbs_mom problem ?


Hi,

forgot to add - tracejob gives :

s154 at cluster1:~> /usr/local/PBS/bin/tracejob 0

Job: 0.cluster1.uea.ac.uk

09/19/2005 13:21:03  S    enqueuing into default, state 1 hop 1
09/19/2005 13:21:03  S    dequeuing from default, state QUEUED
09/19/2005 13:21:03  S    enqueuing into para, state 1 hop 1
09/19/2005 13:21:03  S    Job Queued at request of
s154 at cluster1.uea.ac.uk, owner = s154 at cluster1.uea.ac.uk, job name =
pbs.test, queue
                           = para
09/19/2005 13:24:31  S    enqueuing into para, state 1 hop 1
09/19/2005 13:24:31  S    Requeueing job, substate: 10 Requeued in
queue: para
09/19/2005 13:24:39  S    enqueuing into para, state 1 hop 1
09/19/2005 13:24:39  S    Requeueing job, substate: 10 Requeued in
queue: para
09/19/2005 13:36:25  S    enqueuing into para, state 1 hop 1
09/19/2005 13:36:25  S    Requeueing job, substate: 10 Requeued in
queue: para
09/19/2005 13:36:25  S    enqueuing into para, state 1 hop 1
09/19/2005 13:36:25  S    Requeueing job, substate: 10 Requeued in
queue: para
09/19/2005 13:38:13  S    enqueuing into para, state 1 hop 1
09/19/2005 13:38:13  S    Requeueing job, substate: 10 Requeued in
queue: para
09/19/2005 13:49:09  S    Job aborted on PBS Server initialization
09/19/2005 13:49:09  S    dequeuing from unknown queue, state QUEUED
09/19/2005 13:49:09  S    Unable to open script file
09/19/2005 14:02:51  S    enqueuing into para, state 1 hop 1
09/19/2005 14:02:51  S    Job Queued at request of
s154 at cluster1.uea.ac.uk, owner = s154 at cluster1.uea.ac.uk, job name =
pbs.test, queue
                           = para
09/19/2005 14:04:32  S    Job aborted on PBS Server initialization
09/19/2005 14:04:32  S    dequeuing from unknown queue, state QUEUED
09/19/2005 14:04:32  S    Unable to open script file
09/19/2005 14:10:46  S    enqueuing into para, state 1 hop 1
09/19/2005 14:10:46  S    Job Queued at request of
s154 at cluster1.uea.ac.uk, owner = s154 at cluster1.uea.ac.uk, job name =
pbs.test, queue
                           = para
09/19/2005 14:10:47  S    Job Modified at request of root at cluster1.uea.ac.uk
09/19/2005 14:10:47  S    Job Run at request of root at cluster1.uea.ac.uk
09/19/2005 14:10:47  S    Job Modified at request of root at cluster1.uea.ac.uk
09/19/2005 14:12:27  S    Exit_status=0 resources_used.cput=00:00:00
resources_used.mem=4096kb resources_used.vmem=30952kb
                           resources_used.walltime=00:01:41
09/19/2005 14:12:49  S    Post job file processing error
09/19/2005 14:12:49  S    dequeuing from para, state EXITING



Julie
--
---------------------------------------------------------------
Dr Julie Harold: University of East Anglia, Norwich, NR4 7TJ
       Environmental Sciences: Unix Support Officer
IT and Computing Service: High Performance Computing Consultant
                  phone 01603 59 2385/3121
                 email  j.m.harold at uea.ac.uk
       for env unix/linux support please mail envcs.unix at uea



_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


--
No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.11.1/104 - Release Date: 9/16/2005




More information about the torqueusers mailing list