[torqueusers] newbie questions
jamesz at yahoo-inc.com
Fri Jun 6 09:59:14 MDT 2008
For the first question, it is really strange. I actually can kill other
jobs with 'qdel'.
But I could not kill this specific one (not a parallel job). When I use
qstat -a, it displays the job is still running.
Job ID Username Queue Jobname SessID NDS TSK
Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- ---
------ ----- - -----
60.vibes18.data.corp jamesz batch f.sh 79060 1 -- --
-- R --
Actually I already manually killed the process on the computer node.
From: Brock Palen [mailto:brockp at umich.edu]
Sent: Friday, June 06, 2008 7:40 AM
To: Qiong Zhang
Cc: torqueusers at supercluster.org
Subject: Re: [torqueusers] newbie questions
No problem see below:
Center for Advanced Computing
brockp at umich.edu
On Jun 5, 2008, at 8:06 PM, Qiong Zhang wrote:
1) How to really kill a submitted job?
Looks like qdel not always works.
Strange, It always works for us. If you have many users, we add a
script to epilogue and prolog that kill any processes owned by users
which qstat says do not have valid jobs for that system. Qdel almost
always works though, If your running parallel code be sure to use an
mpirun/mpiexec that uses tm. This will make sure that torque keeps
track of the code.
I tried following options
- kill -9 the process on the running node directly
- qsig -s 9
Nothing worked. The node is still taken and can not run any other jobs.
When I run 'pbsnodes", the state of the node is still 'job-exclusive'.
2) How to run multiple jobs on a single cpu machine?
Is there a way I can submit a job using partial cpu, like 0.25, 0.5 etc?
You can but you almost never want to. Simplest way is to tell torque
that that node has more cpus than it really does:
I think this will work
qmgr -c 's n NODENAME np=6'
3) How to set up a special resource which is attached to the cluster
instead of individual node?
This is a "generic resource" we use this for software licenses all the
time. We enforce them from Moab though. I think torque also has a
similar facility built in.
So if a job requires this type of resource and currently the resource is
already used by another job, the job will be queued.
torqueusers mailing list
torqueusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers