[torqueusers] 15059 errors and defered jobs

dlapine at ncsa.uiuc.edu dlapine at ncsa.uiuc.edu
Fri Aug 11 15:41:38 MDT 2006


Getting some large jobs deferred on 900 node cluster. Running torque-2.1.1
Checking the logs shows error 15059: "sister node unable to communicate"

Checking the node in question, I see no indications (logs, error message
etc) that the node has any issues communicating.

Why is torque reporting a failure to communicate, and what is it basing
this report on?



More information about the torqueusers mailing list