[torqueusers] Time-shared nodes not recognised
David Johnson (MetOcean Solutions)
d.johnson at metocean.co.nz
Thu Apr 13 16:50:01 MDT 2006
I am trying to set up a small cluster of single processor machines. I want to have all configures as timeshared, so as to run long jobs, but also to occasionally push through jobs fast across the whole cluster.
I have Torque/Maui set up. Everything works fine if nodes are configured as 'cluster'
However when I add the :ts to nodes in nodes config file it goes bad.
pbsnodes -a reports everything correctly, all nodes as 'time-shared' and in 'free' state.
However jobs in the queue do not start with error:
job is deferred. Reason: RMFailure (cannot start job - RM failure, rc: 15062, msg: 'Unknown node ')
Holds: Defer (hold reason: RMFailure)
The only difference is the addition of the ':ts' subscript.
Any ideas?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20060414/9e1a10bc/attachment-0001.html
More information about the torqueusers
mailing list