Bug 51 - Fix for r3466 commit
: Fix for r3466 commit
Status: RESOLVED FIXED
Product: TORQUE
pbs_server
: 2.4.x
: PC Linux
: P5 blocker
Assigned To: Glen
:
:
:
  Show dependency treegraph
 
Reported: 2010-02-22 04:09 MST by Simon Toth
Modified: 2010-03-29 15:43 MDT (History)
2 users (show)

See Also:


Attachments
bug fix (1.30 KB, patch)
2010-02-22 04:09 MST, Simon Toth
Details | Diff
Patch to correct these issues (2.23 KB, patch)
2010-02-23 11:09 MST, David Beer
Details | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description Simon Toth 2010-02-22 04:09:04 MST
r3466 commit completely breaks the server (on default setup), here is a quick
bug fix.

The commit itself should be probably fixed, but that would would require a lot
of work. Specifically when taking into account foreign jobs (have to be handled
specifically), job moving (custom job id has to be translated into one, that is
recognized globally), etc....

Here is a quick patch for now.
Comment 1 Simon Toth 2010-02-22 04:09:34 MST
Created an attachment (id=23) [details]
bug fix
Comment 2 David Beer 2010-02-22 11:35:17 MST
When you say completely breaks the server on default setup, what do you mean? I
am able to checkout a clean branch, run the setup, make a nodes file, and run
jobs. 

I appreciate the optimizations in your patch, and I am planning on checking in
a version of it today (I'll post the patch here and explain any differences)

As far as the limitations of this feature, we are aware of them. The idea was
to create this feature and document its restrictions instead of making it
completely robust. This is the idea because we only know of one user that wants
to use it. If it turns out there are others that want this feature and it
becomes more widespread, we can definitely make it more robust and capable.
Comment 3 David Beer 2010-02-22 13:52:48 MST
Simon,

There's one consideration I want to talk to you about before commiting a patch:

Your patch changes things so that the get_correct_jobname function isn't called
if the settings haven't been changed. My only concern is that if a user sets
this, runs a job, and then unsets it, that job may hang in a running state
forever because the server won't always recognize the job's name in the obit.

For me, this is potentially acceptable, it just needs to be considered. We can
instruct people that these settings cannot be changed while jobs are queued or
running. I don't think this would inconvenience anyone much as these settings
shouldn't need to be changed very often, and since most users (I'm guessing)
aren't going to use this feature, and this would affect people that aren't
using it less.

At the same time, this isn't going to add much time to job lookup, since job
lookup has to scroll through all of the jobs known to that server, and this
just adds a function call and some checking. I'm open to both possibilities.
Please let me know your thoughts.
Comment 4 Simon Toth 2010-02-22 21:00:21 MST
> When you say completely breaks the server on default setup, what do you mean? I
> am able to checkout a clean branch, run the setup, make a nodes file, and run
> jobs. 

Well, I can't run any jobs. By default (nothing is set) the Job ID stored with
the job is the full hostname one: jobid.servername.hostname but when nothing is
set the get_correct_jobname returns jobid.servername (string comparison then
simply fails). It may work if the machine you are testing it on does not have a
FQDN.

> At the same time, this isn't going to add much time to job lookup, since job
> lookup has to scroll through all of the jobs known to that server, and this
> just adds a function call and some checking. I'm open to both possibilities.
> Please let me know your thoughts.

Torque is not CPU bound, the performance limitations are elsewhere.

Well, I have a big problem with this feature, because apart from the fact, that
I can't run any jobs on the server by default it isn't compatible with many
other features I implemented into Torque in the last year.

If the feature would be part of Trunk, or if it would be optional (configure
flag), then I would be much more happy.
Comment 5 Glen 2010-02-23 06:56:54 MST
putting this new feature in 2.4-fixes makes me nervous.  It feels like it still
has some kinks to be worked out.  I would feel a lot better if it went into
trunk, and it could be backported to 2.4 once it is super solid (if the
community does not have any reservations).  I know that this was due to a
request for a specific user. They can use a patched 2.4 or a 2.5 snapshot to
get the feature, but I think we should wait to push this out to everyone using
a production release.  At the very least it should be #ifdef'd out by default.
Comment 6 David Beer 2010-02-23 11:09:37 MST
Created an attachment (id=24) [details]
Patch to correct these issues

This patch will make it so that the function to find the correct jobname is
never called if the variables are not set. 

The difference between this fix and the original is that it accounts for both
parameters, not just display_job_server_suffix. 

In the future, we will be more strict about what things can be placed in fix
branches.
Comment 7 David Beer 2010-03-29 15:43:24 MDT
I believe this issue is now taken care of.