[torqueusers] qrerun fails due to Unauthorized Request

David Beer dbeer at adaptivecomputing.com
Mon Nov 21 15:52:42 MST 2011



----- Original Message -----
> Dear David,
> 
> No, my regular account is not a manager, but root is. I had expected
> qrerun to work for a users' own jobs, and I had expected it to work
> from a submit client, not only on the torque server. I cannot make
> all ~200 regular users manager (imagine one of them doing "qdel all"
> on his runaway jobs), and I don't want to give them access to the
> server. My conclusion at the moment is that qrerun cannot be used to
> restart a matlab job at a later time in case the licences run out.
> 

This may not interest you, but there are schedulers (such as Moab) that manage licences for you and would prevent this case. 

> Is there an overview somewhere on what commands the regular user can
> use? The man pages don't provide this information, and the error
> message is not very informative.
> 

I don't know of an overview that exists explaining what commands a user can run at different permission levels, but that is a good request that can be done. As far as the error message - it seems to be appropriate to me - Unauthorized Request seems to be the correct message for a command that a user doesn't have permission to run.

> The qrerun command and many others are installed by the
> torque-clients package in the bin directory; would it not be more
> appropriate to install it in sbin and only install it with the
> torque-server package?
> 

AFAIK, the way it is intended (this was done a long time before I began working on TORQUE, possibly before even Adaptive/CR die) is that the client commands go in bin and the commands that can't ever be run by users go in the sbin directory. qrun also is in the bin directory, even though it also cannot be run unless you are a manager.

Cheers,

David

> best regards,
> Robert
> 
> 
> 
> On 18 Nov 2011, at 17:59, David Beer wrote:
> 
> > Are the super user and or your user at that box managers on
> > pbs_server? You would need manager privileges to qrerun a job.
> > 
> > David
> > 
> > ----- Original Message -----
> >> Dear torque users,
> >> 
> >> I am trying to use qrerun in a shell script to deal with the
> >> (potential) limit in available MATLAB licenses. Let me shortly
> >> outline the idea before explaining the problem.
> >> 
> >> I have a shell script that starts MATLAB with the "-r <filename>"
> >> option for a MATLAB script. In case there is no license available,
> >> MATLAB returns immediately with a descriptive error about the
> >> license failure. I would like to catch that error and if it
> >> happens,
> >> issue "qalter -h u JOBID" and "qrerun JOBID" to reschedule the job
> >> for execution at a later time. Note that I am aware of the ability
> >> to configure floating resources in moab, but I am using maui.
> >> Furthermore, the floating resources for the Matlab license don't
> >> optimally represent the license requirements for scheduling
> >> multiple
> >> jobs by the same user on a multicore machine. Hence I prefer to
> >> use
> >> qrerun instead of making the license a managed resource.
> >> 
> >> The problem I run into can be summarized in the following snippet
> >> from the command line. I schedule a simple job that subsequenty
> >> starts running on one of the execution hosts:
> >> 
> >> roboos at mentat001> echo sleep 1000 | qsub
> >> 45254.dccn-l014.dccn.nl
> >> 
> >> Then I try to use qrerun, first as regular user then as super user
> >> (which I normally would not do of course):
> >> 
> >> roboos at mentat001> qrerun 45254
> >> qrerun: Unauthorized Request  45254.dccn-l014.dccn.nl
> >> roboos at mentat001> sudo qrerun 45254
> >> qrerun: Unauthorized Request  MSG=operation not permitted
> >> 45254.dccn-l014.dccn.nl
> >> 
> >> So as root/administrative user I am also not allowed to do it from
> >> the client machine. I am able to log in directly on the torque
> >> server, where as regular user I am also not allowed to qrerun. It
> >> is
> >> not a general failure of qrerun, since the the root user on the
> >> torque server is allowed to use it:
> >> 
> >> roboos at mentat001> ssh torque
> >> roboos at torque> qrerun 45254
> >> qrerun: Unauthorized Request  45254.dccn-l014.dccn.nl
> >> roboos at torque> sudo qrerun 45254
> >> 
> >> after which the job is correctly requeued and starts over again.
> >> 
> >> To provide some info from the log files: as regular user I get the
> >> following message in /var/spool/torque/server_logs
> >> 
> >> 11/16/2011 09:36:55;0080;PBS_Server;Req;req_reject;Reject reply
> >> code=15018(Request invalid for state of job), aux=0,
> >> type=RerunJob,
> >> from roboos at mentat001.dccn.nl
> >> 
> >> and as root on the torque server I get
> >> 
> >> 11/16/2011 09:38:12;0080;PBS_Server;Req;req_reject;Reject reply
> >> code=15018(Request invalid for state of job), aux=0,
> >> type=RerunJob,
> >> from root at dccn-l014.dccn.nl
> >> 
> >> The log mesaage is basically the same. In the log message on the
> >> execution host I cannot find anything that pertains to the failed
> >> qrerun request.
> >> 
> >> Does anyone have an idea on what might be the problem for the
> >> regular
> >> user not being allowed to restart the job? I tried the same thing
> >> on
> >> a different torque cluster (not managed by me) that I have access
> >> to, and also there it failed.
> >> 
> >> 
> >> best regards,
> >> Robert
> >> 
> >> 
> >> 
> >> -----------------------------------------------------------
> >> Robert Oostenveld, PhD
> >> Senior Researcher & MEG Physicist
> >> Donders Institute for Brain, Cognition and Behaviour
> >> Centre for Cognitive Neuroimaging
> >> Radboud University Nijmegen
> >> tel.: +31 (0)24 3619695
> >> e-mail: r.oostenveld at donders.ru.nl
> >> web: http://www.ru.nl/neuroimaging
> >> skype: r.oostenveld
> >> -----------------------------------------------------------
> >> 
> >> 
> >> 
> >> 
> >> _______________________________________________
> >> torqueusers mailing list
> >> torqueusers at supercluster.org
> >> http://www.supercluster.org/mailman/listinfo/torqueusers
> >> 
> > 
> > --
> > David Beer
> > Direct Line: 801-717-3386 | Fax: 801-717-3738
> >     Adaptive Computing
> >     1712 S East Bay Blvd, Suite 300
> >     Provo, UT 84606
> > 
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 

-- 
David Beer 
Direct Line: 801-717-3386 | Fax: 801-717-3738
     Adaptive Computing
     1712 S East Bay Blvd, Suite 300
     Provo, UT 84606



More information about the torqueusers mailing list