[Mauiusers] torque/maui integration - cannot set hostlist error
garrick at usc.edu
Mon Dec 22 22:05:07 MST 2008
On Sun, Dec 21, 2008 at 08:32:08PM -0500, John Kitchin alleged:
> Hi everyone,
> I am in the process of replacing PBSPro on our cluster with Torque/Maui. I
> have installed the latest versions of Torque and Maui, and Torque appears to
> run fine on its own and runs jobs. The installations seem to have gone well
> according to the directions and tests. I have not been able to get maui to
> schedule jobs though (after stopping pbs_sched and starting maui as user
> jtest), they just remain in the queue in a deferred state.
> our basic setup is a login/submit node where pbs_server and maui run called
> beowulf (beowulf.cheme.cmu.edu is the full name), with the execute nodes on
> an internal network.
> Typical output of checkjob on a deferred job is:
> job is deferred. Reason: RMFailure (job cannot be started - cannot set
> Holds: Defer (hold reason: RMFailure)
> PE: 1.00 StartPriority: 2
> cannot select job 52 for partition DEFAULT (job hold active)
> the torque log indicates an error connecting to MOM:
> 12/21/2008 18:04:32;0008;PBS_Server;Job;52.beowulf;Job Modified at request
> of jtest at beowulf
> 12/21/2008 18:04:32;0001;PBS_Server;Req;;Server could not connect to MOM
> 12/21/2008 18:04:32;0080;PBS_Server;Req;req_reject;Reject reply
> code=15070(Server could not connect to MOM), aux=0, type=ModifyJob, from
> jtest at beowulf
> 12/21/2008 18:05:16;0002;PBS_Server;Svr;PBS_Server;Torque Server Version =
> 2.4.0b1, loglevel = 0
This means that something is wrong between pbs_server and pbs_mom. I don't
think this has anything to do with maui.
Test with 'qrun'. That is a torque command that will attempt to start the job. If that also fails, then you really know it isn't maui.
Also, you are running trunk. You should really start with the latest 2.1.x or
2.3.6 (releasing soon).
> on the nodes, the mom config files contain
> matsim (jtest) ~ > ssh c1n10 'cat /var/spool/torque/mom_priv/config'
> $clienthost beowulf
> $restricted *.cheme.cmu.edu
$clienthost is ancient. You want to use $pbsserver.
And why use $restricted? That disables security.
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
See the Dishonor Roll at http://www.californiansagainsthate.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20081222/66c16ca5/attachment.bin
More information about the mauiusers