[torqueusers] newbie questions

Qiong Zhang jamesz at yahoo-inc.com
Fri Jun 6 17:09:17 MDT 2008


qdel -p 60 actually worked!! Thank you!

 

James

 

________________________________

From: Brock Palen [mailto:brockp at umich.edu] 
Sent: Friday, June 06, 2008 10:52 AM
To: Qiong Zhang
Cc: torqueusers at supercluster.org
Subject: Re: [torqueusers] newbie questions

 

No clue why its not working.  I would check in the mom logs.  Also you
can force purge a job (this may leave things around)  

 

qdel -p 60

 

Only admins can do this though (managers in torque speak).

 

Still it should not do that.


Brock Palen

www.umich.edu/~brockp

Center for Advanced Computing

brockp at umich.edu

(734)936-1985





 

On Jun 6, 2008, at 11:59 AM, Qiong Zhang wrote:





Thanks! Brock.

 

For the first question, it is really strange. I actually can kill other
jobs with 'qdel'.

 

But I could not kill this specific one (not a parallel job). When I use
qstat -a, it displays the job is still running.

 

Job ID               Username Queue    Jobname    SessID NDS   TSK
Memory Time  S Time

-------------------- -------- -------- ---------- ------ ----- ---
------ ----- - -----

60.vibes18.data.corp jamesz   batch    f.sh        79060     1  --    --
--  R   --

 

Actually I already manually killed the process on the computer node.

 

James

________________________________

From: Brock Palen [mailto:brockp at umich.edu] 
Sent: Friday, June 06, 2008 7:40 AM
To: Qiong Zhang
Cc: torqueusers at supercluster.org
Subject: Re: [torqueusers] newbie questions

 

No problem see below:


Brock Palen

www.umich.edu/~brockp

Center for Advanced Computing

brockp at umich.edu

(734)936-1985






 

On Jun 5, 2008, at 8:06 PM, Qiong Zhang wrote:

	1) How to really kill a submitted job?

	Looks like qdel not always works.

Strange, It always works for us.  If you have many users, we add a
script to epilogue and prolog that kill any processes owned by users
which qstat says do not have valid jobs for that system.  Qdel almost
always works though,  If your running parallel code be sure to use an
mpirun/mpiexec that uses tm.  This will make sure that torque keeps
track of the code.






 

I tried following options

- qdel

- kill -9 the process on the running node directly

- qsig -s 9

 

Nothing worked. The node is still taken and can not run any other jobs.

When I run 'pbsnodes", the state of the node is still 'job-exclusive'.

 

2) How to run multiple jobs on a single cpu machine?

Is there a way I can submit a job using partial cpu, like 0.25, 0.5 etc?

You can but you almost never want to.  Simplest way is to tell torque
that that node has more cpus than it really does:

 

I think this will work

 

qmgr -c 's n NODENAME np=6'






 

3) How to set up a special resource which is attached to the cluster
instead of individual node?

 

This is a "generic resource"  we use this for software licenses all the
time.  We enforce them from Moab though.  I think torque also has a
similar facility built in.




So if a job requires this type of resource and currently the resource is
already used by another job, the job will be queued.

 

Thanks,

James

_______________________________________________

torqueusers mailing list

torqueusers at supercluster.org

http://www.supercluster.org/mailman/listinfo/torqueusers

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080606/1101814f/attachment-0001.html


More information about the torqueusers mailing list