[torqueusers] qsub permission error (15007)
Garrick Staples
garrick at usc.edu
Tue Oct 26 13:05:10 MDT 2004
On Tue, Oct 26, 2004 at 08:08:23AM -0400, Dwight Kelly alleged:
> On Tue, 26 Oct 2004, John Wagner wrote:
>
> >Just a thought. Have you tried setting the QSUBSLEEP option to a non-Zero
> >value in the torque.cfg file. Maybe this will help is you are submitting a
> >series of jobs via a script.
>
> qsub is alredy really slow submitting jobs. I'd hate to slow it down
> even more. Is the networking code fragile?
Yes, the whole thing is rather fragile. The original openpbs code was
completely shatter-prone. The current torque code is much better, but isn't
"there" yet.
I'd guess you are having problems because pbs_server blocks while
starting/stopping jobs. If pbs_server blocks for too long, then clients can
timeout trying to authenticate (specifically, running pbs_iff times out).
Also, I've long suspected a bug in the client libs that fail to reap failed
pbs_iff processes. This is evident in longer running clients like 'qsub -I'
and pbstop. If one pbs_iff times out, it stays around as a zombie.
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041026/9eb74b8c/attachment.bin
More information about the torqueusers
mailing list