[torqueusers] mom rejecting job?
Lippert, Kenneth B.
Kenneth.Lippert at alcoa.com
Tue Dec 13 14:04:23 MST 2005
Hello again.
Back on the HPUX. I gave up trying to get the HPUX client to work with
the Linux server, so I just made one of the HPUX machines a server, and
set the HPUX machines as a separate cluster from the Linux one.
Things are progressing. Now I can submit a job from any of the
machines, but if I request it run anywhere except the server the job
queues forever with the following from maui's "checkjob".
job is deferred. Reason: RMFailure (cannot start job - RMFailure, rc:
15041, msg 'execution server rejected request MSG=sendfailed, STARTING')
I have a separate queue for each machine which I tie to a particular
machine by having a
"resources_default.neednodes=local_machine_node_name"
in the queue definition.
Thanks for any pointers, sorry to be a pain.
-k
More information about the torqueusers
mailing list