[torqueusers] qsub permission error (15007)

Garrick Staples garrick at usc.edu
Tue Oct 26 13:05:10 MDT 2004


On Tue, Oct 26, 2004 at 08:08:23AM -0400, Dwight Kelly alleged:
> On Tue, 26 Oct 2004, John Wagner wrote:
> 
> >Just a thought. Have you tried setting the QSUBSLEEP option to a non-Zero 
> >value in the torque.cfg file. Maybe this will help is you are submitting a 
> >series of jobs via a script.
> 
> qsub is alredy really slow submitting jobs. I'd hate to slow it down 
> even more. Is the networking code fragile?

Yes, the whole thing is rather fragile.  The original openpbs code was
completely shatter-prone.  The current torque code is much better, but isn't
"there" yet.

I'd guess you are having problems because pbs_server blocks while
starting/stopping jobs.  If pbs_server blocks for too long, then clients can
timeout trying to authenticate (specifically, running pbs_iff times out).  

Also, I've long suspected a bug in the client libs that fail to reap failed
pbs_iff processes.  This is evident in longer running clients like 'qsub -I'
and pbstop.  If one pbs_iff times out, it stays around as a zombie. 

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041026/9eb74b8c/attachment.bin


More information about the torqueusers mailing list