[torqueusers] strange problem
garrick at usc.edu
Tue Feb 15 00:45:22 MST 2005
Can you test this with the latest torque-1.2.0p1 snapshot? These kinds of
problems with pbs_server blocking for long periods (which prevents maui from
setting neednodes) should be fixed.
On Tue, Feb 15, 2005 at 08:04:45AM +0100, Schulz, Henrik alleged:
> I have the following problem which arises in the combination of TORQUE
> and MAUI:
> In my epilogue script I have to use the variable $6 where all the
> information about the -l parameters of the qsub command are inside.
> In most cases the output of this variable contains the list of the nodes
> where the job was running, but sometimes it contains only the number of
> the requested nodes.
> I checked the maui log and I found the strange thing: whenever the
> output of $6 is correct (which means the list of the nodes - this is
> what I think of as correct) there is a warning message inside the
> logfile of maui looking like this:
> WARNING: cannot set job '<job_id>.<hostname>' attr
> 'Resource_List:neednodes' to '<number_of_req_nodes>' (rc: 15070 'Server
> could not connect to MOM')
> This seems to me as maui is trying to change the value $6 to the number
> instead the list of the requested nodes. Whenenver $6 contains only this
> number there is of course no warnig message inside the log file.
> By the way: The frequency of wrong outputs of $6 differs with the value
> RMPOLLINTERVALL in maui.cfg. The dafault value of 30 sec. leads to an
> error rate of 30%. Other values produce error rates between 0% and 100%.
> Especially 29 sec. seems to be the right intervall for no errors.
> And now I discovered that this problem does not arise between MAUI and
> OpenPBS. Which means: there is no error with any RMPOLLINTERVALL value,
> but there is always the warning message in the maui log file.
> Now my question is: which one is the right functionality? Is there a
> chance to force one of these two ways in TORQUE/MAUI?
> torqueusers mailing list
> torqueusers at supercluster.org
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050214/e9552590/attachment.bin
More information about the torqueusers