[torqueusers] Dependencies being ignored from some submit hosts.

John Hanks griznog at gmail.com
Wed Feb 20 19:39:45 MST 2008


Jobs submitted on submitA do not have dependencies listed by qstat -f.

Jobs submitted on hostA have this line in qstat -f output:

depend = afterany:169.hostA at hostA

All jobs correctly display the submit args with "-W depend=..."

Torque logs these lines when the job asking for dependencies is submitted
from submitA:

02/20/2008 19:31:58;0080;PBS_Server;Req;req_reject;Reject reply
code=15001(Unknown Job Id), aux=0, type=RegisterDependency, from @
hostA.hpc.usu.edu
02/20/2008 19:31:58;0008;PBS_Server;Job;167.hostA;Job Queued at request of
A00017456 at submitA, owner = A00017456 at submitA, job name = job.sh, queue =
uinta
02/20/2008 19:31:58;0008;PBS_Server;Job;167.hostA;Dependency request for job
rejected by 166.hostA.hpc.usu.edu


Thanks,

jbh

On Wed, Feb 20, 2008 at 3:29 PM, Garrick Staples <garrick at usc.edu> wrote:

> On Wed, Feb 20, 2008 at 03:11:46PM -0700, John Hanks alleged:
> > Hello,
> >
> > I have a test setup, torque 2.2.1 and moab 5.2.1 running on a host,
> > call it hostA and a submit host called submitA which only has teh
> > torque clients (qsub, qstat, etc.).  I can successfully sumbint jobs
> > from sumbitA to hostA with qsub, but get odd behavior when using -W
> > depend=afterany:JOBID. For example
> >
> > as a user on hostA I can do
> >
> > $ qsub job.sh
> > hostA.165
> > $ qsub -W depend=afterany:165 job.sh
> > hostA.166
> >
> > Then look at job 166 with checkjob and see it correctly handles the
> dependency:
> >
> > NOTE:  job cannot run  (job has hold in place)
> > NOTE:  job cannot run  (dependency 165 jobsuccessfulcomplete not met)
> > BLOCK MSG: non-idle state 'Hold' (recorded at last scheduling iteration)
> >
> > however, if I do the same thing from submitA
> >
> > $ qsub job.sh
> > hostA.167
> > $ qsub -W depend=afterany:167 job.sh
> > hostA.168
> >
> > Then look at the job with checkjob it says:
> >
> > NOTE:  job cannot run  (job has hold in place)
> > BLOCK MSG: non-idle state 'Hold' (recorded at last scheduling iteration)
> >
> > and treats this as a hold, so that the job never runs until I do a
> > manual releasehold to release the hold.
> >
> > I have server_name on both hostA and submitA set to point to hostA and
> > torque has
> >
> > set server submit_hosts = submitA
> >
> > in it's configuration. What do I need to do to have dependencies
> > handled correctly from any submit host?
>
> 'checkjob' is a maui program and doesn't really say what is going on
> within torque.
>
> Does 'qstat -f' show that the deps are correctly set up within torque?
>
> --
> Garrick Staples, GNU/Linux HPCC SysAdmin
> University of Southern California
>
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080220/cd29284a/attachment.html


More information about the torqueusers mailing list