[torqueusers] qsub: Bad UID for job execution

Garrick Staples garrick at usc.edu
Tue Apr 4 12:07:46 MDT 2006


On Tue, Apr 04, 2006 at 01:43:16PM -0400, Doug Renfrew alleged:
> So I thought that using the
> 
> ALLOWCOMPUTEHOSTSUBMIT true
> 
> in my torque.cfg file meant that I did not have to have a hosts.equiv

I thought so too.  I'll double check what's going on here.


> or even have any of the 'r' tools or services (rlogin, rlogind, rcp,
> etc.) installed or running. Is this incorrect? Do I need to have
> rlogind running on the server? Will just having a hosts.equiv file
> with the nodes in it work without turning on rlogind?

No, you won't need any 'r' services running.  pbs_server directly calls
ruserok() which reads hosts.equiv.  It is not talking to any services or
running any 'r' commands.


> On 3/31/06, Torsten Rohlfing <rohlfing at ieee.org> wrote:
> >
> > Hi everyone!
> >
> > I guess I have to ask about this myself now, because I have had the same
> > problem (I should say symptom) with one of my machines for a long time,
> > and none of the proposed solutions works for me.
> >
> > Here's what I have: 12 machines in the cluster, 1 of them server running
> > torque-2.0.0p7 (problem has persisted since 1.2.something) and the other
> > 11 compute nodes. All machines are using the same NIS server. All
> > compute nodes are in the server's /etc/hosts.equiv. The server is in all
> > nodes' hosts.equiv, just for good measure. I just set the
> > ALLOWCOMPUTEHOSTSUBMIT flag in torque.cfg also, since I wasn't aware of
> > that one before (yes, I restarted the pbs_server process).
> >
> > Now here's the funny thing - all my compute nodes (and the server, which
> > is also a compute node) can submit jobs EXCEPT one of the compute nodes.
> > The only difference I can remotely think of between that compute node
> > and all the others is that this one used to be the server in a torque
> > test installation before I got the actual server. Yet, I have checked
> > all config files many times, and the all compute nodes have essentially
> > (except for number of CPUs, max loads, etc) the same configs, including
> > the one that cannot submit jobs.
> >
> > So I have to ask - is there any not-so-well-know and straight forward
> > cause of this problem? Or is there a more fundamental solution - like
> > tell the server to allow submission from anywhere? All my machines are
> > on a private network, so I really don't care much about restricting
> > submissions.
> >
> > Thanks for your help!
> >   Torsten
> > > On Thu, Mar 30, 2006 at 05:29:57PM -0500 or thereabouts, Doug Renfrew wrote:
> > > >/ Hi Gang,
> > > />/
> > > />/ I am having trouble setting things up so that users can submit jobs
> > > />/ from the compute hosts. I have a pretty simple setup. Machine 1 is
> > > />/ acting as the PBS server, the NFS server, and the NIS server. Machine
> > > />/ 2-16 are acting as PBS clients, NFS clients, and NIS clients. In the
> > > />/ torque.cfg file on the PBS server I have set ALLOWCOMPUTEHOSTSUBMIT
> > > />/ true. Users can log in to any of the machines and use qstat, qmgr,
> > > />/ pbsnodes, etc but qsub fails with the error below.
> > > />/
> > > />/ qsub: Bad UID for job execution
> > > /
> > >
> > > Hi Doug,
> > >
> > >  A well known I am sure you are pleased to here.
> > >
> > >  Add you pbs clients as /etc/hosts.equiv on your pbs_server
> > >  host or use the newer
> > >
> > >  ALLOWCOMPUTEHOSTSUBMIT
> > >
> > >  as defined here.
> > >
> > >  http://www.clusterresources.com/products/torque/docs20/a.ktorquecfg.shtml
> > > >/
> > > />/ Users can submit jobs from machine 1 but not from machines 2-16. Since
> > > />/ we are using NIS the user ids are the same no matter which machine the
> > > />/ user is logged into. Can anyone give me any advice on how to figure
> > > />/ out what is going on.
> > > />/
> > > />/ Doug
> > > />/ --
> > > />/ ---------------------------------------------
> > > />/ P. Douglas Renfrew
> > > />/ Graduate Student
> > > />/ Molecular and Cellular Biophysics Program
> > > />/ Dept. Biochemistry and Biophysics
> > > />/ Unv. of North Carolina at Chapel Hill
> > > />/ ---------------------------------------------
> > > />/ _______________________________________________
> > > />/ torqueusers mailing list
> > > />/ torqueusers at supercluster.org <http://www.supercluster.org/mailman/listinfo/torqueusers>
> > > />/ http://www.supercluster.org/mailman/listinfo/torqueusers
> > > /
> > > --
> > > Steve Traylen
> > > s.traylen at rl.ac.uk <http://www.supercluster.org/mailman/listinfo/torqueusers>
> > > http://www.gridpp.ac.uk/
> >
> >
> > --
> > Torsten Rohlfing, PhD          SRI International, Neuroscience Program
> >   Research Scientist             333 Ravenswood Ave, Menlo Park, CA 94025
> >    Phone: ++1 (650) 859-3379      Fax: ++1 (650) 859-2743
> >     torsten at synapse.sri.com        http://www.stanford.edu/~rohlfing/
> >
> >       "Though this be madness, yet there is a method in't"
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> 
> 
> --
> ---------------------------------------------
> P. Douglas Renfrew
> Graduate Student
> Molecular and Cellular Biophysics Program
> Dept. Biochemistry and Biophysics
> Unv. of North Carolina at Chapel Hill
> cell: (919)618-0700
> ---------------------------------------------
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060404/6330fa4b/attachment.bin


More information about the torqueusers mailing list