[Mauiusers] torque/maui integration - cannot set hostlist
jkitchin at andrew.cmu.edu
Tue Dec 23 13:53:27 MST 2008
I uninstalled the branch version and installed torque 2.3.5 and then
everything was fine. qrun worked just fine with the branch version.
thanks for the tips on mom_priv/config
> 2. Re: torque/maui integration - cannot set hostlist error
> (Garrick Staples)
> Message: 2
> Date: Mon, 22 Dec 2008 21:05:07 -0800
> From: Garrick Staples <garrick at usc.edu>
> Subject: Re: [Mauiusers] torque/maui integration - cannot set hostlist
> To: mauiusers at supercluster.org
> Message-ID: <20081223050507.GT3820 at polop.usc.edu>
> Content-Type: text/plain; charset="us-ascii"
> On Sun, Dec 21, 2008 at 08:32:08PM -0500, John Kitchin alleged:
> > Hi everyone,
> > I am in the process of replacing PBSPro on our cluster with Torque/Maui.
> > have installed the latest versions of Torque and Maui, and Torque appears
> > run fine on its own and runs jobs. The installations seem to have gone
> > according to the directions and tests. I have not been able to get maui
> > schedule jobs though (after stopping pbs_sched and starting maui as user
> > jtest), they just remain in the queue in a deferred state.
> > our basic setup is a login/submit node where pbs_server and maui run
> > beowulf (beowulf.cheme.cmu.edu is the full name), with the execute nodes
> > an internal network.
> > Typical output of checkjob on a deferred job is:
> > job is deferred. Reason: RMFailure (job cannot be started - cannot set
> > hostlist)
> > Holds: Defer (hold reason: RMFailure)
> > PE: 1.00 StartPriority: 2
> > cannot select job 52 for partition DEFAULT (job hold active)
> > the torque log indicates an error connecting to MOM:
> > 12/21/2008 18:04:32;0008;PBS_Server;Job;52.beowulf;Job Modified at
> > of jtest at beowulf
> > 12/21/2008 18:04:32;0001;PBS_Server;Req;;Server could not connect to MOM
> > 12/21/2008 18:04:32;0080;PBS_Server;Req;req_reject;Reject reply
> > code=15070(Server could not connect to MOM), aux=0, type=ModifyJob, from
> > jtest at beowulf
> > 12/21/2008 18:05:16;0002;PBS_Server;Svr;PBS_Server;Torque Server Version
> > 2.4.0b1, loglevel = 0
> This means that something is wrong between pbs_server and pbs_mom. I don't
> think this has anything to do with maui.
> Test with 'qrun'. That is a torque command that will attempt to start the
> job. If that also fails, then you really know it isn't maui.
> Also, you are running trunk. You should really start with the latest 2.1.x
> 2.3.6 (releasing soon).
> > on the nodes, the mom config files contain
> > matsim (jtest) ~ > ssh c1n10 'cat /var/spool/torque/mom_priv/config'
> > $clienthost beowulf
> > $restricted *.cheme.cmu.edu
> $clienthost is ancient. You want to use $pbsserver.
> And why use $restricted? That disables security.
> Garrick Staples, GNU/Linux HPCC SysAdmin
> University of Southern California
> See the Dishonor Roll at http://www.californiansagainsthate.com/
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mauiusers