[torqueusers] Torque not honoring max_user_queuable : Two commands to check

Coyle, James J [ITACD] jjc at iastate.edu
Fri Feb 3 10:36:05 MST 2012


Ti Legget,

I'd suggest checking two commands to confirm 
that there is a problem: 

1) Really simple issue:
Make sure your count is correct:

   Issue:

  qstat -u linpyl | awk '$3 == "batch" {print}' | wc -l

to see if this exceeds 500.

  The command that you displayed would count jobs with name batchjob
submitted by a user whose name includes linpyl as part of the name.
(so user linpylon could be adding to the total, or 
linpyl could have jobs in two different queues called batchjob.
I encountered these issues because I have users who have similar names 
and I have users who use the same name for every job.

  The command above should avoid these issues to get a reliable count.

2) Did a torque admin change the max_user_queuable
  before/after these jobs were submitted?

  Check the pbs_server logs to see if max_user_queuable
was changed after these jobs were submitted.
   I am a torque admin, so I could get around max_user_queable, by changing it
and changing it back, as could any other torque admin, and as could 
someone who has root privileges (knows root password or has sudo capability).
The evidence should be in the logs then, though.

  grep max_user_queuable /var/spool/torque/server_logs/2012*

should get the answer to this questions.

  I have two backups, and a user could call them to ask them up
up the count temporarily.  If you see evidence of this, I'd ask the
other torque admins first.

 James Coyle, PhD
 High Performance Computing Group     
 115 Durham Center            
 Iowa State Univ.           phone: (515)-294-2099
 Ames, Iowa 50011           web: http://jjc.public.iastate.edu/

>-----Original Message-----
>From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
>bounces at supercluster.org] On Behalf Of Ti Leggett
>Sent: Friday, February 03, 2012 9:27 AM
>To: Torque Users Mailing List
>Subject: [torqueusers] Torque not honoring max_user_queuable
>
>We've set queue limits that don't seem to be honored:
>
>sdb:~ # qstat | grep linpyl | grep batch | wc
>    945    5670   82215
>
>sdb:~ # qmgr -c "print queue batch"
>#
># Create queues and set their attributes.
>#
>#
># Create and define queue batch
>#
>create queue batch
>set queue batch queue_type = Execution
>set queue batch max_user_queuable = 500
>set queue batch resources_min.mppwidth = 1 set queue batch
>resources_default.mppwidth = 24 set queue batch
>resources_default.walltime = 00:10:00 set queue batch
>acl_group_enable = False set queue batch resources_available.nodes =
>726 set queue batch enabled = True set queue batch started = True
>
>How would it be possible for a user to have 945 jobs in the queue
>when the limit should be 500?


More information about the torqueusers mailing list