[Mauiusers] completed jobs still shown in queue
Bisbal, Prentice
PBisbal at LexPharma.com
Wed Mar 1 12:36:01 MST 2006
No - I don't have any epilogue scripts configured. The script I was running was very simple:
$ more pbs_test.sh
#!/bin/bash
echo "Hello from $(uname -n)"
sleep 20
printenv | egrep "PBS_NODENUM|PBS_VNODENUM|PBS_TASKNUM|PBS_O_HOST" | sort
echo " "
exit 0
Prentice
-----Original Message-----
From: Matney Sr, Kenneth D. [mailto:matneykdsr at ornl.gov]
Sent: Wed 3/1/2006 2:16 PM
To: Bisbal, Prentice
Subject: RE: [Mauiusers] completed jobs still shown in queue
Is it possible that MOM was running an epilog on behalf of
the job in this time interval? For example, an epilog that
removes scratch areas that are NFS mounted to all of
your compute nodes might cause a delay between when
PBS records an exit status for the job and the job is marked
complete at the server.
Just curious. -- Ken Matney, Sr.
________________________________
From: mauiusers-bounces at supercluster.org
[mailto:mauiusers-bounces at supercluster.org] On Behalf Of Bisbal,
Prentice
Sent: Wednesday, March 01, 2006 1:30 PM
To: Stewart.Samuels at sanofi-aventis.com; mauiusers at supercluster.org
Subject: RE: [Mauiusers] completed jobs still shown in queue
qdel didn't work for me - something about the job being in an invalid
state for that operation.
All the jobs involved were on a system that was very loaded (8 cpus, all
at 99% usage). I suspect the heavy loading of the system caused delays
in communication which in turn caused some sort fo message time out.
Prentice
-----Original Message-----
From: Stewart.Samuels at sanofi-aventis.com
[mailto:Stewart.Samuels at sanofi-aventis.com]
Sent: Wed 3/1/2006 12:45 PM
To: Bisbal, Prentice; mauiusers at supercluster.org
Subject: RE: [Mauiusers] completed jobs still shown in queue
We se the same behavior periodically. We are running torque-1.2.0p1 and
maui-3.2.6p11. Not only is this an anoyance, but it also prevents maui
from scheduling jobs on those nodes. Most of the time you can qdel
them.
Stewart
-----Original Message-----
From: mauiusers-bounces at supercluster.org
[mailto:mauiusers-bounces at supercluster.org]On Behalf Of Bisbal, Prentice
Sent: Wednesday, March 01, 2006 10:03 AM
To: mauiusers at supercluster.org
Subject: [Mauiusers] completed jobs still shown in queue
I have 4 simple jobs stuck in my queue. The jobs ran to completion, but
they are still shown as being in the queue:
$ showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING
STARTTIME
3183 pxxxxxx Running 1 00:46:01 Wed Mar 1
09:44:58
3184 pxxxxxx Running 1 00:46:04 Wed Mar 1
09:45:01
3185 pxxxxxx Running 1 00:46:04 Wed Mar 1
09:45:01
3186 pxxxxxx Running 1 00:46:04 Wed Mar 1
09:45:01
4 Active Jobs 4 of 22 Processors Active (18.18%)
1 of 7 Nodes Active (14.29%)
A tracejob shows that these jobs completed and exited w/o any errors:
$ tracejob 3186
Job: 3186.hw-emperor.lexpharma.com
03/01/2006 09:43:38 S enqueuing into batch, state 1 hop 1
03/01/2006 09:43:38 S Job Queued at request of
pxxxxxx at hw-underdog.xxxxxxxxx.com owner =
pxxxxxx at hw-underdog.xxxxxxxxx.com, job name =
PBS_TEST.87, queue = batch
03/01/2006 09:45:02 S Job Modified at request of
maui at hw-emperor.lexpharma.com
03/01/2006 09:45:02 S Job Run at request of
maui at hw-emperor.xxxxxxxxxx.com
03/01/2006 09:45:33 S Exit_status=0 resources_used.cpupercent=0
resources_used.cput=00:00:00
resources_used.mem=5408kb
resources_used.vmem=9280kb
resources_used.walltime=00:00:30
Any idea why these jobs are still shown in the queue? What is the best
way to get rid of them?
Prentice
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20060301/7e3b643c/attachment.html
More information about the mauiusers
mailing list