Bugzilla – Bug 105
Job dependencies broken when job_suffix_alias is used
Last modified: 2011-02-17 05:22:14 MST
You need to log in before you can comment on or make changes to this bug.
Hello *, Currently we see a strange behaviour when a user tries to submit a job with a dependency. What happens is that the job is held forever when the job_suffix_alias option is enable ( display_job_server = False ). I also tested it on a cluster without the job_suffix_alias and the jobs there will run as predicted.
We are using torque 2.4.11
Has someone looked at this issue?
Dennis, When you set these parameters, what was the status of your cluster? Setting these parameters while you have queued jobs can cause TORQUE to not be able to find jobs, since the older jobs weren't named the same way. TORQUE tries to recover from this, but there are issues that it can't address. (That's why this feature isn't documented on our website, we were trying to force people to ask us questions about it before using it.) David
David, (In reply to comment #3) > Dennis, > > When you set these parameters, what was the status of your cluster? We started with a fresh installation of Torque 2.4.11 > Setting > these parameters while you have queued jobs can cause TORQUE to not be able to > find jobs, since the older jobs weren't named the same way. TORQUE tries to > recover from this, but there are issues that it can't address. (That's why this > feature isn't documented on our website, we were trying to force people to ask > us questions about it before using it.) We noticed the behaviour of the job_suffix_alias when we tested it. Perhaps it's best to document this because it changes the behaviour of Torque when there are jobs queued. > > David Regards, Dennis
David. I just read your comment on Thia and i am bit confused. Most of times we get questions sbout "can you test this feature of pbs_server?". We saw in the changelog this new feature and it was exactly what i want, eg to switch to a new server for example. If you want to prevent people from using it do not mention it in the Changelog. So we done some testing and it seems to work the only thing that we did not test was the dependencies. Now we encounter a bug in this new feature and we get your comment. That is inspiring for us and the people reporting problems ;-) Is the answer dependencies do not work with this parameter and do not use this parameter for your cluster. Or is this a bug that will be fixed? Regards