[torqueusers] newbie questions

Brock Palen brockp at umich.edu
Fri Jun 6 11:51:34 MDT 2008


No clue why its not working.  I would check in the mom logs.  Also  
you can force purge a job (this may leave things around)

qdel -p 60

Only admins can do this though (managers in torque speak).

Still it should not do that.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985



On Jun 6, 2008, at 11:59 AM, Qiong Zhang wrote:

> Thanks! Brock.
>
> For the first question, it is really strange. I actually can kill  
> other jobs with ‘qdel’.
>
> But I could not kill this specific one (not a parallel job). When I  
> use qstat –a, it displays the job is still running.
>
> Job ID               Username Queue    Jobname    SessID NDS   TSK  
> Memory Time  S Time
> -------------------- -------- -------- ---------- ------ ----- ---  
> ------ ----- - -----
> 60.vibes18.data.corp jamesz   batch    f.sh        79060     1   
> --    --    --  R   --
>
> Actually I already manually killed the process on the computer node.
>
> James
> From: Brock Palen [mailto:brockp at umich.edu]
> Sent: Friday, June 06, 2008 7:40 AM
> To: Qiong Zhang
> Cc: torqueusers at supercluster.org
> Subject: Re: [torqueusers] newbie questions
>
> No problem see below:
>
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
>
> On Jun 5, 2008, at 8:06 PM, Qiong Zhang wrote:
>> 1) How to really kill a submitted job?
>> Looks like qdel not always works.
> Strange, It always works for us.  If you have many users, we add a  
> script to epilogue and prolog that kill any processes owned by  
> users which qstat says do not have valid jobs for that system.   
> Qdel almost always works though,  If your running parallel code be  
> sure to use an mpirun/mpiexec that uses tm.  This will make sure  
> that torque keeps track of the code.
>
>
>
> I tried following options
> - qdel
> - kill -9 the process on the running node directly
> - qsig –s 9
>
> Nothing worked. The node is still taken and can not run any other  
> jobs.
> When I run ‘pbsnodes”, the state of the node is still ‘job-exclusive’.
>
> 2) How to run multiple jobs on a single cpu machine?
> Is there a way I can submit a job using partial cpu, like 0.25, 0.5  
> etc?
> You can but you almost never want to.  Simplest way is to tell  
> torque that that node has more cpus than it really does:
>
> I think this will work
>
> qmgr -c 's n NODENAME np=6'
>
>
>
> 3) How to set up a special resource which is attached to the  
> cluster instead of individual node?
>
> This is a "generic resource"  we use this for software licenses all  
> the time.  We enforce them from Moab though.  I think torque also  
> has a similar facility built in.
>
> So if a job requires this type of resource and currently the  
> resource is already used by another job, the job will be queued.
>
> Thanks,
> James
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080606/efdf2ec8/attachment.html


More information about the torqueusers mailing list