Bugzilla – Bug 105
Job dependencies broken when job_suffix_alias is used
Last modified: 2011-02-17 05:22:14 MST
You need to
before you can comment on or make changes to this bug.
Currently we see a strange behaviour when a user tries to submit a job with a
dependency. What happens is that the job is held forever when the
job_suffix_alias option is enable ( display_job_server = False ).
I also tested it on a cluster without the job_suffix_alias and the jobs there
will run as predicted.
We are using torque 2.4.11
Has someone looked at this issue?
When you set these parameters, what was the status of your cluster? Setting
these parameters while you have queued jobs can cause TORQUE to not be able to
find jobs, since the older jobs weren't named the same way. TORQUE tries to
recover from this, but there are issues that it can't address. (That's why this
feature isn't documented on our website, we were trying to force people to ask
us questions about it before using it.)
(In reply to comment #3)
> When you set these parameters, what was the status of your cluster?
We started with a fresh installation of Torque 2.4.11
> these parameters while you have queued jobs can cause TORQUE to not be able to
> find jobs, since the older jobs weren't named the same way. TORQUE tries to
> recover from this, but there are issues that it can't address. (That's why this
> feature isn't documented on our website, we were trying to force people to ask
> us questions about it before using it.)
We noticed the behaviour of the job_suffix_alias when we tested it. Perhaps
it's best to document this because it changes the behaviour of Torque when
there are jobs queued.
David. I just read your comment on Thia and i am bit confused. Most of times we
get questions sbout "can you test this feature of pbs_server?". We saw in the
changelog this new feature and it was exactly what i want, eg to switch to a
new server for example. If you want to prevent people from using it do not
mention it in the Changelog.
So we done some testing and it seems to work the only thing that we did not
test was the dependencies.
Now we encounter a bug in this new feature and we get your comment. That is
inspiring for us and the people reporting problems ;-)
Is the answer dependencies do not work with this parameter and do not use this
parameter for your cluster. Or is this a bug that will be fixed?