[torqueusers] jobs being held in substate 22 JOB_SUBSTATE_DEPNHOLD

Garrick Staples garrick at clusterresources.com
Wed Oct 25 10:28:42 MDT 2006


On Tue, Oct 24, 2006 at 11:49:46AM -0500, Marc Schraffenberger alleged:
> I have a large number of jobs that are being held because of
> dependencies (at least that is what I gather from the job substate)
> but I don't see why since the execution time has past and there are
> only beforeany dependencies. I was wondering if anyone could help
> clarify this for me.

What version of TORQUE is this?  We fixed some bugs a long time ago with
failed jobs not properly releasing their deps.

 
> Here are some details on a particular job (some other jobs have
> dependencies on this one but have it in the "afterany" type):
> 
> Job Id: 495325.localhost
>    Job_Name = t1073
>    Job_Owner = cdrone at localhost
>    job_state = H
>    queue = mediumpriority
>    server = localhost
>    Checkpoint = u
>    ctime = Wed Sep 20 01:17:00 2006
>    depend = 
>    beforeany:495511.localhost at localhost:495655.localhost at localhost:49
>        5823.localhost at localhost:496046.localhost at localhost:497005.localhost at lo
>        calhost:497086.localhost at localhost:497256.localhost at localhost:497351.lo
>        .......
>        st:517616.localhost at localhost:517668.localhost at localhost:517806.localho
>        st at localhost:518008.localhost at localhost:518104.localhost at localhost:5182
>        69.localhost at localhost:518459.localhost at localhost:519822.localhost at loca
>        lhost:519957.localhost at localhost
>    Error_Path = localhost://t1073.e495325
>    Hold_Types = u
>    Join_Path = n
>    Keep_Files = n
>    Mail_Points = a
>    mtime = Tue Sep 26 01:07:45 2006
>    Output_Path = localhost://t1073.o495325
>    Priority = 0
>    qtime = Wed Sep 20 01:17:00 2006
>    Rerunable = True
>    Resource_List.db_free = 1
>    Resource_List.mem = 319mb
>    Resource_List.nice = 0
>    substate = 22
>    Variable_List = PBS_O_HOME=/root,PBS_O_LOGNAME=root,
>        PBS_O_PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bi
>        n,PBS_O_MAIL=/var/mail/root,PBS_O_SHELL=/bin/bash,PBS_O_HOST=localhost,
>        PBS_O_WORKDIR=/,
>        PBS_ARGUMENTS=-d3 -P --distribution 4 --accountid 180 --update,
>        PBS_FILENAME=/usr/local/tsa/bidmgr/sebidmgr.sh,PBS_RETRIES=0,
>        PBS_O_QUEUE=mediumpriority
>    euser = cdrone
>    egroup = cdrone
>    queue_rank = 452584
>    queue_type = E
>    comment = Not Running: Strict fifo order
> 
> 
> 
> Job: 495325.localhost
> 
> 09/20/2006 01:17:00  S    enqueuing into mediumpriority, state 3 hop 1
> 09/20/2006 01:17:00  S    Job Queued at request of cdrone at localhost,
> owner = cdrone at localhost, job name = t1073, queue = mediumpriority
> 09/20/2006 01:17:00  S    Dependency request for job rejected by
> 491698.localhost
> 09/20/2006 01:17:00  A    queue=mediumpriority
> 09/20/2006 01:17:27  S    Job Modified at request of Scheduler at localhost
> 09/20/2006 01:18:02  S    Dependency on job 492090.localhost released.
> 09/20/2006 01:18:04  S    Dependency on job 491837.localhost released.
> 09/20/2006 05:53:32  S    Dependency on job 493304.localhost released.
> 09/20/2006 05:53:33  S    Dependency on job 493171.localhost released.
> 09/20/2006 05:53:33  S    Dependency on job 493021.localhost released.
> 09/20/2006 05:53:33  S    Dependency on job 492983.localhost released.
> 09/20/2006 07:38:16  S    Dependency on job 493513.localhost released.
> 09/20/2006 07:38:16  S    Dependency on job 493376.localhost released.
> 09/20/2006 10:01:47  S    Dependency on job 493902.localhost released.
> 09/20/2006 10:01:48  S    Dependency on job 493782.localhost released.
> 09/20/2006 10:01:48  S    Dependency on job 493614.localhost released.
> 09/20/2006 14:28:23  S    Dependency on job 494601.localhost released.
> 09/20/2006 14:28:23  S    Dependency on job 494428.localhost released.
> 09/20/2006 14:28:24  S    Dependency on job 494180.localhost released.
> 09/20/2006 14:28:24  S    Dependency on job 494061.localhost released.
> 09/20/2006 19:27:41  S    Dependency on job 495254.localhost released.
> 09/20/2006 19:27:42  S    Dependency on job 495042.localhost released.
> 09/20/2006 19:27:42  S    Dependency on job 494975.localhost released.
> 09/20/2006 19:27:42  S    Dependency on job 494818.localhost released.
> 09/20/2006 19:27:42  S    Dependency on job 494703.localhost released.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list