[torqueusers] Torque not honoring max_user_queuable : Two commands to check
Coyle, James J [ITACD]
jjc at iastate.edu
Fri Feb 3 10:36:05 MST 2012
Ti Legget,
I'd suggest checking two commands to confirm
that there is a problem:
1) Really simple issue:
Make sure your count is correct:
Issue:
qstat -u linpyl | awk '$3 == "batch" {print}' | wc -l
to see if this exceeds 500.
The command that you displayed would count jobs with name batchjob
submitted by a user whose name includes linpyl as part of the name.
(so user linpylon could be adding to the total, or
linpyl could have jobs in two different queues called batchjob.
I encountered these issues because I have users who have similar names
and I have users who use the same name for every job.
The command above should avoid these issues to get a reliable count.
2) Did a torque admin change the max_user_queuable
before/after these jobs were submitted?
Check the pbs_server logs to see if max_user_queuable
was changed after these jobs were submitted.
I am a torque admin, so I could get around max_user_queable, by changing it
and changing it back, as could any other torque admin, and as could
someone who has root privileges (knows root password or has sudo capability).
The evidence should be in the logs then, though.
grep max_user_queuable /var/spool/torque/server_logs/2012*
should get the answer to this questions.
I have two backups, and a user could call them to ask them up
up the count temporarily. If you see evidence of this, I'd ask the
other torque admins first.
James Coyle, PhD
High Performance Computing Group
115 Durham Center
Iowa State Univ. phone: (515)-294-2099
Ames, Iowa 50011 web: http://jjc.public.iastate.edu/
>-----Original Message-----
>From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
>bounces at supercluster.org] On Behalf Of Ti Leggett
>Sent: Friday, February 03, 2012 9:27 AM
>To: Torque Users Mailing List
>Subject: [torqueusers] Torque not honoring max_user_queuable
>
>We've set queue limits that don't seem to be honored:
>
>sdb:~ # qstat | grep linpyl | grep batch | wc
> 945 5670 82215
>
>sdb:~ # qmgr -c "print queue batch"
>#
># Create queues and set their attributes.
>#
>#
># Create and define queue batch
>#
>create queue batch
>set queue batch queue_type = Execution
>set queue batch max_user_queuable = 500
>set queue batch resources_min.mppwidth = 1 set queue batch
>resources_default.mppwidth = 24 set queue batch
>resources_default.walltime = 00:10:00 set queue batch
>acl_group_enable = False set queue batch resources_available.nodes =
>726 set queue batch enabled = True set queue batch started = True
>
>How would it be possible for a user to have 945 jobs in the queue
>when the limit should be 500?
More information about the torqueusers
mailing list