[torqueusers] newbie questions
brockp at umich.edu
Fri Jun 6 11:51:34 MDT 2008
No clue why its not working. I would check in the mom logs. Also
you can force purge a job (this may leave things around)
qdel -p 60
Only admins can do this though (managers in torque speak).
Still it should not do that.
Center for Advanced Computing
brockp at umich.edu
On Jun 6, 2008, at 11:59 AM, Qiong Zhang wrote:
> Thanks! Brock.
> For the first question, it is really strange. I actually can kill
> other jobs with ‘qdel’.
> But I could not kill this specific one (not a parallel job). When I
> use qstat –a, it displays the job is still running.
> Job ID Username Queue Jobname SessID NDS TSK
> Memory Time S Time
> -------------------- -------- -------- ---------- ------ ----- ---
> ------ ----- - -----
> 60.vibes18.data.corp jamesz batch f.sh 79060 1
> -- -- -- R --
> Actually I already manually killed the process on the computer node.
> From: Brock Palen [mailto:brockp at umich.edu]
> Sent: Friday, June 06, 2008 7:40 AM
> To: Qiong Zhang
> Cc: torqueusers at supercluster.org
> Subject: Re: [torqueusers] newbie questions
> No problem see below:
> Brock Palen
> Center for Advanced Computing
> brockp at umich.edu
> On Jun 5, 2008, at 8:06 PM, Qiong Zhang wrote:
>> 1) How to really kill a submitted job?
>> Looks like qdel not always works.
> Strange, It always works for us. If you have many users, we add a
> script to epilogue and prolog that kill any processes owned by
> users which qstat says do not have valid jobs for that system.
> Qdel almost always works though, If your running parallel code be
> sure to use an mpirun/mpiexec that uses tm. This will make sure
> that torque keeps track of the code.
> I tried following options
> - qdel
> - kill -9 the process on the running node directly
> - qsig –s 9
> Nothing worked. The node is still taken and can not run any other
> When I run ‘pbsnodes”, the state of the node is still ‘job-exclusive’.
> 2) How to run multiple jobs on a single cpu machine?
> Is there a way I can submit a job using partial cpu, like 0.25, 0.5
> You can but you almost never want to. Simplest way is to tell
> torque that that node has more cpus than it really does:
> I think this will work
> qmgr -c 's n NODENAME np=6'
> 3) How to set up a special resource which is attached to the
> cluster instead of individual node?
> This is a "generic resource" we use this for software licenses all
> the time. We enforce them from Moab though. I think torque also
> has a similar facility built in.
> So if a job requires this type of resource and currently the
> resource is already used by another job, the job will be queued.
> torqueusers mailing list
> torqueusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers