[Mauiusers] torque/maui integration - cannot set hostlist

John Kitchin jkitchin at andrew.cmu.edu
Tue Dec 23 13:53:27 MST 2008


 I uninstalled the branch version and installed torque 2.3.5 and then
everything was fine.  qrun worked just fine with the branch version.

thanks for the tips on mom_priv/config

j


>   2. Re: torque/maui integration - cannot set hostlist error
>      (Garrick Staples)
> Message: 2
> Date: Mon, 22 Dec 2008 21:05:07 -0800
> From: Garrick Staples <garrick at usc.edu>
> Subject: Re: [Mauiusers] torque/maui integration - cannot set hostlist
>        error
> To: mauiusers at supercluster.org
> Message-ID: <20081223050507.GT3820 at polop.usc.edu>
> Content-Type: text/plain; charset="us-ascii"
>
> On Sun, Dec 21, 2008 at 08:32:08PM -0500, John Kitchin alleged:
> > Hi everyone,
> >
> > I am in the process of replacing PBSPro on our cluster with Torque/Maui.
> I
> > have installed the latest versions of Torque and Maui, and Torque appears
> to
> > run fine on its own and runs jobs. The installations seem to have gone
> well
> > according to the directions and tests. I have not been able to get maui
> to
> > schedule jobs though (after stopping pbs_sched and starting maui as user
> > jtest), they just remain in the queue in a deferred state.
> >
> > our basic setup is a login/submit node where pbs_server and maui run
> called
> > beowulf (beowulf.cheme.cmu.edu is the full name), with the execute nodes
> on
> > an internal network.
> >
> > Typical output of checkjob on a deferred job is:
> >
> > job is deferred.  Reason:  RMFailure  (job cannot be started - cannot set
> > hostlist)
> > Holds:    Defer  (hold reason:  RMFailure)
> > PE:  1.00  StartPriority:  2
> > cannot select job 52 for partition DEFAULT (job hold active)
> >
> > the torque log indicates an error connecting to MOM:
> > 12/21/2008 18:04:32;0008;PBS_Server;Job;52.beowulf;Job Modified at
> request
> > of jtest at beowulf
> > 12/21/2008 18:04:32;0001;PBS_Server;Req;;Server could not connect to MOM
> > 12/21/2008 18:04:32;0080;PBS_Server;Req;req_reject;Reject reply
> > code=15070(Server could not connect to MOM), aux=0, type=ModifyJob, from
> > jtest at beowulf
> > 12/21/2008 18:05:16;0002;PBS_Server;Svr;PBS_Server;Torque Server Version
> =
> > 2.4.0b1, loglevel = 0
>
> This means that something is wrong between pbs_server and pbs_mom.  I don't
> think this has anything to do with maui.
>
> Test with 'qrun'.  That is a torque command that will attempt to start the
> job.  If that also fails, then you really know it isn't maui.
>
> Also, you are running trunk.  You should really start with the latest 2.1.x
> or
> 2.3.6 (releasing soon).
>
>
> > on the nodes, the mom config files contain
> > matsim (jtest) ~ > ssh c1n10 'cat /var/spool/torque/mom_priv/config'
> > $clienthost beowulf
> > $restricted *.cheme.cmu.edu
>
> $clienthost is ancient.  You want to use $pbsserver.
>
> And why use $restricted?  That disables security.
>
> --
> Garrick Staples, GNU/Linux HPCC SysAdmin
> University of Southern California
>
> See the Dishonor Roll at http://www.californiansagainsthate.com/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20081223/19bb1ecb/attachment.html


More information about the mauiusers mailing list