[torqueusers] Torque deletes all jobs of user on same node
rjacobi at email.arizona.edu
Sat Nov 19 18:08:13 MST 2011
We've recently run into a curious problem with torque. When a user deletes on of his jobs using "qdel jobid", and this job to be deleted spans more than one processor on the node, then all other jobs of the same user on the same node are canceled as well. If the deleted job only runs on one processor, then the other jobs of the user on the node are not affected and keep running.
Thus it seems to me that whenever the pbs mom on the node has to delete from more than one processor it somehow indiscriminately tries to delete them from all processors and the other user's jobs might only be unaffected due to the lack of privileges over other users processes.
At this point I've no clue how to further diagnose or solve this issue. I've tried to google this problem but couldn't find anything, so I hope you have an idea.
University of Arizona
Department of Aerospace & Mechanical Engineering
1130 N. Mountain Ave.
Tucson, AZ, 85721-0119
tel: +1 (520) 621 4369
mail: rjacobi at email.arizona.edu
The less time you spent on algebra in life, the more time you have to be a happy person. (Kerschen)
Doubt is not a pleasant condition, but certainty is absurd. (Voltaire)
All great truths begin as blasphemies. (Shaw)
Denken ist etwas, das auf Schwierigkeiten folgt und dem das Handeln vorausgeht.(Brecht)
More information about the torqueusers