[torqueusers] Time-shared nodes not recognised

David Johnson (MetOcean Solutions) d.johnson at metocean.co.nz
Thu Apr 13 16:50:01 MDT 2006


I am trying to set up a small cluster of single processor machines. I want to have all configures as timeshared, so as to run long jobs, but also to occasionally push through jobs fast across the whole cluster.

I have Torque/Maui set up. Everything works fine if nodes are configured as 'cluster'
However when I add the :ts to nodes in nodes config file it goes bad.

pbsnodes -a reports everything correctly, all nodes as 'time-shared' and in 'free' state.

However jobs in the queue do not start with error:

job is deferred.  Reason:  RMFailure  (cannot start job - RM failure, rc: 15062, msg: 'Unknown node ')
Holds:    Defer  (hold reason:  RMFailure)

The only difference is the addition of the ':ts' subscript.

Any ideas?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20060414/9e1a10bc/attachment-0001.html


More information about the torqueusers mailing list