[torqueusers] usage of a cluster
timlee126 at yahoo.com
Mon Feb 1 14:37:52 MST 2010
1. I noticed that there is no swap for any node of our cluster. Is it normal for most clusters?
I am runing my job on a node. What will happen to my job if the memory is used up? Do I have no other choice but to kill my job?
The node finally runs out of memory and does not respond. I emailed it to the administrator and after a while I found the node is rebooted without affecting other nodes. Feel lucky my job did not bring down the whole cluster. Is using up memory one kind of behaviour what administrator dislikes from the users?
2. My jobs are sumitted by Torque. Will Torque make the newly submitted jobs waiting if there are not enough resources for run them?
I wonder if each user still has to check the usage status of the cluster before deciding to submit new jobs? How to exactly?
By "qstat" I can see the jobs that are running and by "qstat -q" I can see how many jobs in each queue are running.
But how can I find info about the usage percentage of all nodes and cores and memory to get a big picture and decide if I better not to submit my jobs but wait for more resources become available?
More information about the torqueusers