[torqueusers] strange problem
H.Schulz at fz-rossendorf.de
Tue Feb 15 00:04:45 MST 2005
I have the following problem which arises in the combination of TORQUE
In my epilogue script I have to use the variable $6 where all the
information about the -l parameters of the qsub command are inside.
In most cases the output of this variable contains the list of the nodes
where the job was running, but sometimes it contains only the number of
the requested nodes.
I checked the maui log and I found the strange thing: whenever the
output of $6 is correct (which means the list of the nodes - this is
what I think of as correct) there is a warning message inside the
logfile of maui looking like this:
WARNING: cannot set job '<job_id>.<hostname>' attr
'Resource_List:neednodes' to '<number_of_req_nodes>' (rc: 15070 'Server
could not connect to MOM')
This seems to me as maui is trying to change the value $6 to the number
instead the list of the requested nodes. Whenenver $6 contains only this
number there is of course no warnig message inside the log file.
By the way: The frequency of wrong outputs of $6 differs with the value
RMPOLLINTERVALL in maui.cfg. The dafault value of 30 sec. leads to an
error rate of 30%. Other values produce error rates between 0% and 100%.
Especially 29 sec. seems to be the right intervall for no errors.
And now I discovered that this problem does not arise between MAUI and
OpenPBS. Which means: there is no error with any RMPOLLINTERVALL value,
but there is always the warning message in the maui log file.
Now my question is: which one is the right functionality? Is there a
chance to force one of these two ways in TORQUE/MAUI?
More information about the torqueusers