[torqueusers] newbie questions
Qiong Zhang
jamesz at yahoo-inc.com
Fri Jun 6 09:59:14 MDT 2008
Thanks! Brock.
For the first question, it is really strange. I actually can kill other
jobs with 'qdel'.
But I could not kill this specific one (not a parallel job). When I use
qstat -a, it displays the job is still running.
Job ID Username Queue Jobname SessID NDS TSK
Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- ---
------ ----- - -----
60.vibes18.data.corp jamesz batch f.sh 79060 1 -- --
-- R --
Actually I already manually killed the process on the computer node.
James
________________________________
From: Brock Palen [mailto:brockp at umich.edu]
Sent: Friday, June 06, 2008 7:40 AM
To: Qiong Zhang
Cc: torqueusers at supercluster.org
Subject: Re: [torqueusers] newbie questions
No problem see below:
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985
On Jun 5, 2008, at 8:06 PM, Qiong Zhang wrote:
1) How to really kill a submitted job?
Looks like qdel not always works.
Strange, It always works for us. If you have many users, we add a
script to epilogue and prolog that kill any processes owned by users
which qstat says do not have valid jobs for that system. Qdel almost
always works though, If your running parallel code be sure to use an
mpirun/mpiexec that uses tm. This will make sure that torque keeps
track of the code.
I tried following options
- qdel
- kill -9 the process on the running node directly
- qsig -s 9
Nothing worked. The node is still taken and can not run any other jobs.
When I run 'pbsnodes", the state of the node is still 'job-exclusive'.
2) How to run multiple jobs on a single cpu machine?
Is there a way I can submit a job using partial cpu, like 0.25, 0.5 etc?
You can but you almost never want to. Simplest way is to tell torque
that that node has more cpus than it really does:
I think this will work
qmgr -c 's n NODENAME np=6'
3) How to set up a special resource which is attached to the cluster
instead of individual node?
This is a "generic resource" we use this for software licenses all the
time. We enforce them from Moab though. I think torque also has a
similar facility built in.
So if a job requires this type of resource and currently the resource is
already used by another job, the job will be queued.
Thanks,
James
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080606/33e9cf57/attachment.html
More information about the torqueusers
mailing list