[torqueusers] Dependencies being ignored from some submit hosts.

John Hanks griznog at gmail.com
Wed Feb 20 19:44:24 MST 2008


Looking at more log message I see these:

02/20/2008 19:32:01;0080;PBS_Server;Job;167.hostA.hpc.usu.edu;Unknown Job Id

Is this because the sumbitA host is adding .hpc.usu.edu to the job name? If
so, where is it picking this up from?

jbh


On Wed, Feb 20, 2008 at 7:39 PM, John Hanks <griznog at gmail.com> wrote:

> Jobs submitted on submitA do not have dependencies listed by qstat -f.
>
> Jobs submitted on hostA have this line in qstat -f output:
>
> depend = afterany:169.hostA at hostA
>
> All jobs correctly display the submit args with "-W depend=..."
>
> Torque logs these lines when the job asking for dependencies is submitted
> from submitA:
>
> 02/20/2008 19:31:58;0080;PBS_Server;Req;req_reject;Reject reply
> code=15001(Unknown Job Id), aux=0, type=RegisterDependency, from @
> hostA.hpc.usu.edu
> 02/20/2008 19:31:58;0008;PBS_Server;Job;167.hostA;Job Queued at request of
> A00017456 at submitA, owner = A00017456 at submitA, job name = job.sh, queue =
> uinta
> 02/20/2008 19:31:58;0008;PBS_Server;Job;167.hostA;Dependency request for
> job rejected by 166.hostA.hpc.usu.edu
>
>
> Thanks,
>
> jbh
>
> On Wed, Feb 20, 2008 at 3:29 PM, Garrick Staples <garrick at usc.edu> wrote:
>
> > On Wed, Feb 20, 2008 at 03:11:46PM -0700, John Hanks alleged:
> > > Hello,
> > >
> > > I have a test setup, torque 2.2.1 and moab 5.2.1 running on a host,
> > > call it hostA and a submit host called submitA which only has teh
> > > torque clients (qsub, qstat, etc.).  I can successfully sumbint jobs
> > > from sumbitA to hostA with qsub, but get odd behavior when using -W
> > > depend=afterany:JOBID. For example
> > >
> > > as a user on hostA I can do
> > >
> > > $ qsub job.sh
> > > hostA.165
> > > $ qsub -W depend=afterany:165 job.sh
> > > hostA.166
> > >
> > > Then look at job 166 with checkjob and see it correctly handles the
> > dependency:
> > >
> > > NOTE:  job cannot run  (job has hold in place)
> > > NOTE:  job cannot run  (dependency 165 jobsuccessfulcomplete not met)
> > > BLOCK MSG: non-idle state 'Hold' (recorded at last scheduling
> > iteration)
> > >
> > > however, if I do the same thing from submitA
> > >
> > > $ qsub job.sh
> > > hostA.167
> > > $ qsub -W depend=afterany:167 job.sh
> > > hostA.168
> > >
> > > Then look at the job with checkjob it says:
> > >
> > > NOTE:  job cannot run  (job has hold in place)
> > > BLOCK MSG: non-idle state 'Hold' (recorded at last scheduling
> > iteration)
> > >
> > > and treats this as a hold, so that the job never runs until I do a
> > > manual releasehold to release the hold.
> > >
> > > I have server_name on both hostA and submitA set to point to hostA and
> > > torque has
> > >
> > > set server submit_hosts = submitA
> > >
> > > in it's configuration. What do I need to do to have dependencies
> > > handled correctly from any submit host?
> >
> > 'checkjob' is a maui program and doesn't really say what is going on
> > within torque.
> >
> > Does 'qstat -f' show that the deps are correctly set up within torque?
> >
> > --
> > Garrick Staples, GNU/Linux HPCC SysAdmin
> > University of Southern California
> >
> > Please avoid sending me Word or PowerPoint attachments.
> > See http://www.gnu.org/philosophy/no-word-attachments.html
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080220/ba98938b/attachment.html


More information about the torqueusers mailing list