Bugzilla – Bug 139
Negative value in 'Que' when using qstat
Last modified: 2012-05-23 01:49:05 MDT
You need to log in before you can comment on or make changes to this bug.
New issue with our recently upgraded Torque server. Here is my 'qstat' output: Queue Memory CPU Time Walltime Node Run Que Lm State ---------------- ------ -------- -------- ---- --- --- -- ----- verylong -- 48:00:00 -- -- 0 0 20 D S hpq -- 24:00:00 -- -- 0 0 -- E R ops -- 24:00:00 -- -- 0 0 -- E R short -- 04:00:00 -- -- 0 0 10 E R long -- 24:00:00 -- -- 5 -5 -- E R amd -- 24:00:00 -- -- 2 -1 -- E R tvac -- 24:00:00 -- -- 0 0 -- E R ----- ----- What I am trying to figure out is why does 'qstat' show negative numbers in the 'Que' field. Is this some new "feature"? I don't remember this happening on our previous installation. The only way I can remove the negative 'Que' values is by restarting pbs_server. Thanks! Paul
I'm getting the same issue with torque version 3.0.3. Paul, which version are you using?
Quick follow up. I upgraded to 3.0.4 and "qstat -q" now gives the correct output: However, the problem persists with "qmgr -c 'list server' | grep state_count": state_count = Transit:0 Queued:-48 Held:100 Waiting:0 Running:48 Exiting:0
After a second look, even "qstat -q" returns the wrong output: $ qstat -q Queue Memory CPU Time Walltime Node Run Que Lm State ---------------- ------ -------- -------- ---- --- --- -- ----- kraken.q -- -- -- -- 47 -1 -- E R ----- ----- 47 -1 $ qmgr -c 'list server' | grep state_count state_count = Transit:0 Queued:-101 Held:100 Waiting:0 Running:47 Exiting:0
Hello, I am also seeing this problem with 2.5.10. In our case, not only is a negative number listed in Que, but 2 non-existent jobs are listed in Run. I am not sure if these two issues are related, though. qstat as a privileged user lists zero jobs, but qstat -q shows: Queue Memory CPU Time Walltime Node Run Que Lm State ---------------- ------ -------- -------- ---- --- --- -- ----- batch -- -- -- -- 2 -4 -- E R ----- ----- 2 -4 And qmgr -c 'list server' | grep state_count shows: state_count = Transit:0 Queued:0 Held:-4 Waiting:0 Running:2 Exiting:0 The only way to clear these erroneous numbers is to restart pbs_server. Has this issue been resolved in 3.0.4 or 4.0.0?
> Has this issue been resolved in 3.0.4 or 4.0.0? The bug is still present in 3.0.4: % sudo pbs_server --version version: 3.0.4 % qmgr -c 'l s' | grep state_count state_count = Transit:0 Queued:-22737 Held:22590 Waiting:0 Running:139
Same problem on my 2.5.9 installation: qterm -t quick and pbs_server -t hot helps, but only for short time.
This bug also occurs in 4.0.2. when I submit Array jobs.
(In reply to comment #7) > This bug also occurs in 4.0.2. > when I submit Array jobs. Now confirmed on 2.5.11, however, it's not as bad as on 2.5.9, because what 2.5.11 does from time to time is the following: 05/18/2012 19:49:10;0001;PBS_Server;Svr;PBS_Server;Job state counts incorrect, server 0: 0 -17 15 0 2 0 ; queue auto 0 (completed: 0): 0 0 0 0 0 0 ... and so the negative values don't linger for long, especially if queues are quite busy. It is still annoying though... P.S. Cheers to Riken ;-)