[Mauiusers] completed jobs still shown in queue

Bisbal, Prentice PBisbal at LexPharma.com
Wed Mar 1 12:36:01 MST 2006


No - I don't have any epilogue scripts configured. The script I was running was very simple:

$ more pbs_test.sh 
#!/bin/bash
echo "Hello from $(uname -n)"
sleep 20
printenv | egrep "PBS_NODENUM|PBS_VNODENUM|PBS_TASKNUM|PBS_O_HOST" | sort
echo " "
exit 0


Prentice 



-----Original Message-----
From: Matney Sr, Kenneth D. [mailto:matneykdsr at ornl.gov]
Sent: Wed 3/1/2006 2:16 PM
To: Bisbal, Prentice
Subject: RE: [Mauiusers] completed jobs still shown in queue
 
Is it possible that MOM was running an epilog on behalf of
the job in this time interval?  For example, an epilog that
removes scratch areas that are NFS mounted to all of
your compute nodes might cause a delay between when
PBS records an exit status for the job and the job is marked
complete at the server.
 
Just curious.  -- Ken Matney, Sr.

________________________________

From: mauiusers-bounces at supercluster.org
[mailto:mauiusers-bounces at supercluster.org] On Behalf Of Bisbal,
Prentice
Sent: Wednesday, March 01, 2006 1:30 PM
To: Stewart.Samuels at sanofi-aventis.com; mauiusers at supercluster.org
Subject: RE: [Mauiusers] completed jobs still shown in queue



qdel didn't work for me - something about the job being in an invalid
state for that operation.

All the jobs involved were on a system that was very loaded (8 cpus, all
at 99% usage). I suspect the heavy loading of the system caused delays
in communication which in turn caused some sort fo message time out.

Prentice



-----Original Message-----
From: Stewart.Samuels at sanofi-aventis.com
[mailto:Stewart.Samuels at sanofi-aventis.com]
Sent: Wed 3/1/2006 12:45 PM
To: Bisbal, Prentice; mauiusers at supercluster.org
Subject: RE: [Mauiusers] completed jobs still shown in queue

We se the same behavior periodically.  We are running torque-1.2.0p1 and
maui-3.2.6p11.  Not only is this an anoyance, but it also prevents maui
from scheduling jobs on those nodes.  Most of the time you can qdel
them.

                Stewart

-----Original Message-----
From: mauiusers-bounces at supercluster.org
[mailto:mauiusers-bounces at supercluster.org]On Behalf Of Bisbal, Prentice
Sent: Wednesday, March 01, 2006 10:03 AM
To: mauiusers at supercluster.org
Subject: [Mauiusers] completed jobs still shown in queue



I have 4 simple jobs stuck in my queue. The jobs ran to completion, but
they are still shown as being in the queue:


$ showq
ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING
STARTTIME

3183                pxxxxxx    Running     1    00:46:01  Wed Mar  1
09:44:58
3184                pxxxxxx    Running     1    00:46:04  Wed Mar  1
09:45:01
3185                pxxxxxx    Running     1    00:46:04  Wed Mar  1
09:45:01
3186                pxxxxxx    Running     1    00:46:04  Wed Mar  1
09:45:01

     4 Active Jobs       4 of   22 Processors Active (18.18%)
                         1 of    7 Nodes Active      (14.29%)

A tracejob shows that these jobs completed and exited w/o any errors:

$  tracejob 3186

Job: 3186.hw-emperor.lexpharma.com

03/01/2006 09:43:38  S    enqueuing into batch, state 1 hop 1
03/01/2006 09:43:38  S    Job Queued at request of
                          pxxxxxx at hw-underdog.xxxxxxxxx.com owner =
                          pxxxxxx at hw-underdog.xxxxxxxxx.com, job name =
                          PBS_TEST.87, queue = batch
03/01/2006 09:45:02  S    Job Modified at request of
                          maui at hw-emperor.lexpharma.com
03/01/2006 09:45:02  S    Job Run at request of
maui at hw-emperor.xxxxxxxxxx.com
03/01/2006 09:45:33  S    Exit_status=0 resources_used.cpupercent=0
                          resources_used.cput=00:00:00
resources_used.mem=5408kb
                          resources_used.vmem=9280kb
                          resources_used.walltime=00:00:30

Any idea why these jobs are still shown in the queue? What is the best
way to get rid of them?

Prentice






-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20060301/7e3b643c/attachment.html


More information about the mauiusers mailing list