[torqueusers] Slot limit issues (still)

Ken Nielson knielson at adaptivecomputing.com
Tue Sep 24 10:14:01 MDT 2013


On Tue, Sep 24, 2013 at 9:30 AM, Andrus, Brian Contractor
<bdandrus at nps.edu>wrote:

>  Ok, This one is still going on with the same array job.****
>
> ** **
>
> I have many array jobs (same parent job) that have gone into a 'blocked'
> status because they couldn't start in a timely manner
> (DEFERTIME/DEFERCOUNT). Not unsual for a sizeable array job with slot
> limits (set server max_slot_limit = 512).****
>
> ** **
>
> So I want to start some of these jobs. The user has NO jobs currently
> running (there ARE other jobs running, only 5 are other array jobs, but a
> different user).****
>
> ** **
>
> I am trying with job 20139590[1561]****
>
> Here is what I try/get:****
>
> ** **
>
> *[root at cluster ~]# qrls 20139590[1561]*
>
> *[root at cluster ~]# qrun 20139590[1561]*
>
> *qrun: Invalid request MSG=Cannot run job. Array slot limit is 512 and
> there are already 512 jobs running*
>
> *20139590[1561].cluster*
>
> *[root at cluster ~]# qrerun 20139590[1561]*
>
> *qrerun: Request invalid for state of job MSG=job 20139590[1561].cluster
> is in a bad state 20139590[1561].cluster*
>
> ** **
>
> ** **
>
> I have tried restarting pbs_server and looked at the output of pbsnodes to
> see if there are any of this job floating around, but there is not. Also
> checked on each node for anything for that job/user.. Nothing there as well.
> ****
>
> ** **
>
> Any ideas what is going on here and/or how to get these jobs running?****
>
> ** **
>
> ** **
>
> ** **
>
> Brian Andrus****
>
> ITACS/Research Computing****
>
> Naval Postgraduate School****
>
> Monterey, California****
>
> voice: 831-656-6238****
>
> ** **
>
> **
>
Brian,

I see you are doing a qrls on the job before running the job. So these jobs
are on hold before they run. Correct?

Regards


-- 
Ken Nielson
+1 801.717.3700 office +1 801.717.3738 fax
1712 S. East Bay Blvd, Suite 300  Provo, UT  84606
www.adaptivecomputing.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130924/66bc3e68/attachment.html 


More information about the torqueusers mailing list