[torqueusers] newbie questions

Qiong Zhang jamesz at yahoo-inc.com
Fri Jun 6 09:59:14 MDT 2008

Thanks! Brock.


For the first question, it is really strange. I actually can kill other
jobs with 'qdel'.


But I could not kill this specific one (not a parallel job). When I use
qstat -a, it displays the job is still running.


Job ID               Username Queue    Jobname    SessID NDS   TSK
Memory Time  S Time

-------------------- -------- -------- ---------- ------ ----- ---
------ ----- - -----

60.vibes18.data.corp jamesz   batch    f.sh        79060     1  --    --
--  R   --


Actually I already manually killed the process on the computer node.




From: Brock Palen [mailto:brockp at umich.edu] 
Sent: Friday, June 06, 2008 7:40 AM
To: Qiong Zhang
Cc: torqueusers at supercluster.org
Subject: Re: [torqueusers] newbie questions


No problem see below:

Brock Palen


Center for Advanced Computing

brockp at umich.edu



On Jun 5, 2008, at 8:06 PM, Qiong Zhang wrote:

	1) How to really kill a submitted job?

	Looks like qdel not always works.

Strange, It always works for us.  If you have many users, we add a
script to epilogue and prolog that kill any processes owned by users
which qstat says do not have valid jobs for that system.  Qdel almost
always works though,  If your running parallel code be sure to use an
mpirun/mpiexec that uses tm.  This will make sure that torque keeps
track of the code.


I tried following options

- qdel

- kill -9 the process on the running node directly

- qsig -s 9 


Nothing worked. The node is still taken and can not run any other jobs.

When I run 'pbsnodes", the state of the node is still 'job-exclusive'.


2) How to run multiple jobs on a single cpu machine?

Is there a way I can submit a job using partial cpu, like 0.25, 0.5 etc?

You can but you almost never want to.  Simplest way is to tell torque
that that node has more cpus than it really does:


I think this will work


qmgr -c 's n NODENAME np=6'


3) How to set up a special resource which is attached to the cluster
instead of individual node?


This is a "generic resource"  we use this for software licenses all the
time.  We enforce them from Moab though.  I think torque also has a
similar facility built in.

So if a job requires this type of resource and currently the resource is
already used by another job, the job will be queued.





torqueusers mailing list

torqueusers at supercluster.org



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080606/33e9cf57/attachment.html

More information about the torqueusers mailing list