[torqueusers] Why so many on one host?

skip at pobox.com skip at pobox.com
Sun Jan 17 09:24:02 MST 2010

I just qsub'd nearly 2400 jobs.  About 80 are running at the moment.  When I

    qstat -f | egrep exec_host | sort | uniq -c

I see that one host (which just has a single core) was assigned a huge
number of jobs:

     204     exec_host = cutter.wacker/0
       1     exec_host = huron.wacker/0
       1     exec_host = huron.wacker/1
       1     exec_host = huron.wacker/2
       1     exec_host = huron.wacker/3
       2     exec_host = hurt.wacker/0
       2     exec_host = hurt.wacker/1
       1     exec_host = ruth.wacker/0
       1     exec_host = udesktop116.wacker/0

I looked at it and noticed a full root partition.  I marked it offline.  Is
there some way to force Maui to reassign jobs away from that machine?  Why
was that one machine apparently assigned so many jobs in the first place?


Skip Montanaro - skip at pobox.com - http://www.smontanaro.net/

