[torqueusers] jobs being held in substate 22 JOB_SUBSTATE_DEPNHOLD
garrick at clusterresources.com
Wed Oct 25 10:28:42 MDT 2006
On Tue, Oct 24, 2006 at 11:49:46AM -0500, Marc Schraffenberger alleged:
> I have a large number of jobs that are being held because of
> dependencies (at least that is what I gather from the job substate)
> but I don't see why since the execution time has past and there are
> only beforeany dependencies. I was wondering if anyone could help
> clarify this for me.
What version of TORQUE is this? We fixed some bugs a long time ago with
failed jobs not properly releasing their deps.
> Here are some details on a particular job (some other jobs have
> dependencies on this one but have it in the "afterany" type):
> Job Id: 495325.localhost
> Job_Name = t1073
> Job_Owner = cdrone at localhost
> job_state = H
> queue = mediumpriority
> server = localhost
> Checkpoint = u
> ctime = Wed Sep 20 01:17:00 2006
> depend =
> beforeany:495511.localhost at localhost:495655.localhost at localhost:49
> 5823.localhost at localhost:496046.localhost at localhost:497005.localhost at lo
> calhost:497086.localhost at localhost:497256.localhost at localhost:497351.lo
> st:517616.localhost at localhost:517668.localhost at localhost:517806.localho
> st at localhost:518008.localhost at localhost:518104.localhost at localhost:5182
> 69.localhost at localhost:518459.localhost at localhost:519822.localhost at loca
> lhost:519957.localhost at localhost
> Error_Path = localhost://t1073.e495325
> Hold_Types = u
> Join_Path = n
> Keep_Files = n
> Mail_Points = a
> mtime = Tue Sep 26 01:07:45 2006
> Output_Path = localhost://t1073.o495325
> Priority = 0
> qtime = Wed Sep 20 01:17:00 2006
> Rerunable = True
> Resource_List.db_free = 1
> Resource_List.mem = 319mb
> Resource_List.nice = 0
> substate = 22
> Variable_List = PBS_O_HOME=/root,PBS_O_LOGNAME=root,
> PBS_ARGUMENTS=-d3 -P --distribution 4 --accountid 180 --update,
> euser = cdrone
> egroup = cdrone
> queue_rank = 452584
> queue_type = E
> comment = Not Running: Strict fifo order
> Job: 495325.localhost
> 09/20/2006 01:17:00 S enqueuing into mediumpriority, state 3 hop 1
> 09/20/2006 01:17:00 S Job Queued at request of cdrone at localhost,
> owner = cdrone at localhost, job name = t1073, queue = mediumpriority
> 09/20/2006 01:17:00 S Dependency request for job rejected by
> 09/20/2006 01:17:00 A queue=mediumpriority
> 09/20/2006 01:17:27 S Job Modified at request of Scheduler at localhost
> 09/20/2006 01:18:02 S Dependency on job 492090.localhost released.
> 09/20/2006 01:18:04 S Dependency on job 491837.localhost released.
> 09/20/2006 05:53:32 S Dependency on job 493304.localhost released.
> 09/20/2006 05:53:33 S Dependency on job 493171.localhost released.
> 09/20/2006 05:53:33 S Dependency on job 493021.localhost released.
> 09/20/2006 05:53:33 S Dependency on job 492983.localhost released.
> 09/20/2006 07:38:16 S Dependency on job 493513.localhost released.
> 09/20/2006 07:38:16 S Dependency on job 493376.localhost released.
> 09/20/2006 10:01:47 S Dependency on job 493902.localhost released.
> 09/20/2006 10:01:48 S Dependency on job 493782.localhost released.
> 09/20/2006 10:01:48 S Dependency on job 493614.localhost released.
> 09/20/2006 14:28:23 S Dependency on job 494601.localhost released.
> 09/20/2006 14:28:23 S Dependency on job 494428.localhost released.
> 09/20/2006 14:28:24 S Dependency on job 494180.localhost released.
> 09/20/2006 14:28:24 S Dependency on job 494061.localhost released.
> 09/20/2006 19:27:41 S Dependency on job 495254.localhost released.
> 09/20/2006 19:27:42 S Dependency on job 495042.localhost released.
> 09/20/2006 19:27:42 S Dependency on job 494975.localhost released.
> 09/20/2006 19:27:42 S Dependency on job 494818.localhost released.
> 09/20/2006 19:27:42 S Dependency on job 494703.localhost released.
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers