[torqueusers] Torque 4.1.2 does not accept hostname with '-'
mej at lbl.gov
Fri Oct 19 16:34:22 MDT 2012
On Thursday, 18 October 2012, at 09:03:03 (+0800),
Clotho Tsang wrote:
> The following problem is found at Torque 4.1.2, but not 4.1.0.
> At RHEL6, if the headnode hostname consists of char "-",
> jobs will keep running but not stop, checkjob shows message
> "cannot start job - RM failure, rc: 15033, msg: 'End of File' "
> The problem is not found if the hostname has no "-".
We are seeing the same issue at our site. (Our master node's name
ends in "-00") We have a ticket open with Adaptive for this, but so
far it's proved very elusive.
Looking at the code, the only place that really sticks out to me where
'-' is handled specially (at least in terms of hostnames) has to do
with NUMA. NUMA nodes appear to be named using a hyphen followed by
one or more digits.
I noticed that your hostname also had a hyphen followed by a digit.
Have you by any chance tried a hostname with hyphens but no numbers in
Have you had any luck tracking down the issue in the code? I've been
looking at it, but I don't see anything jumping out at me.
Michael Jennings <mej at lbl.gov>
Senior HPC Systems Engineer
High-Performance Computing Services
Lawrence Berkeley National Laboratory
Bldg 50B-3209E W: 510-495-2687
MS 050B-3209 F: 510-486-8615
More information about the torqueusers