[torqueusers] jobs stuck
cluster.bill at alinto.com
Mon Sep 18 02:46:56 MDT 2006
Come back to work on monday and I saw every jobs stucks.
CPUs are up to 0% working.
show_pbs_res.py shows me:
Total nodes : 2
Nodes with 4 CPU
1 node with -8 CPU free
1 node with 0 CPU free
I had 6 jobs running (sort of) each of them asking 2 CPU !
All jobs where located on the same node.
Where can I search to understand what happens this week-end?
I qdel these 6 jobs.
One job start (requesting 6 CPU).
I got a lot more asking for 2 CPU but they don't start.
qstat -f on these jobs show me:
comment = Not Running: Draining system to allow starving job to run
what happened to my cluster?
Where can I begin to search?
More information about the torqueusers