[torqueusers] [Mauiusers] Maui-Torque integration problems

Jim Kusznir jkusznir at gmail.com
Fri Dec 11 13:42:09 MST 2009


Thanks for the suggestion.  My server_priv was 750, I changed it to
755, but no change.

all the other permissions were the same,e xcept I did not have the
files nodes and nodes_status yet (still trying to make basic queue
management work correctly).

My test was to run qmgr as kusznir and try set server managers +=
maui at isp-curran.isp.wsu.edu

I got qmgr obj= svr=default: Unauthorized Request back

--Jim

On Fri, Dec 11, 2009 at 10:26 AM, Scott L. Hamilton
<hamilton.mst at gmail.com> wrote:
> Jim,
>
> I would suggest looking at the file permission on the server_priv folder.  I
> had file permissions messed up on a troque install once and only root had
> access to the files which made the server fail in several strange ways.
> Here is the permission tree on my installation for comparison.
>
> [root at nic-p1 server_priv]# ls -al
> total 156
> drwxr-xr-x 12 root root  4096 Nov 10 11:03 .
> drwxr-xr-x 18 root root  4096 May 21  2009 ..
> drwxr-xr-x  2 root root 36864 Dec 11 00:37 accounting
> drwxr-x---  2 root root  4096 May 16  2008 acl_groups
> drwxr-x---  2 root root  4096 May 16  2008 acl_hosts
> drwxr-x---  2 root root  4096 Nov 10 11:03 acl_svr
> drwxr-x---  2 root root  4096 May 16  2008 acl_users
> drwxr-x---  2 root root  4096 May 16  2008 arrays
> drwxr-x---  2 root root  4096 May 16  2008 disallowed_types
> drwxr-x---  2 root root  4096 May 16  2008 hostlist
> drwxr-x---  2 root root 61440 Dec 11 12:02 jobs
> -rw-r--r--  1 root root  2925 Nov  9 16:05 nodes
> -rw-r--r--  1 root root    15 Dec  7 13:45 node_status
> drwxr-x---  2 root root  4096 May 16  2008 queues
> -rw-------  1 root root  1902 Dec 11 10:56 serverdb
> -rw-------  1 root root     6 Nov 10 11:03 server.lock
> -rw-------  1 root root     0 Sep  5  2008 tracking
>
> I don't know if it will fix your issue or not, but it couldn't hurt to try
> it.
>
> Scott
>
>
> Jim Kusznir wrote:
>>
>> Unfortunately, all that has been done already.
>>
>> The more I play with it, the more it seems that torque is hard-coded
>> to only accept root.  Right now, root isn't even in the managers or
>> operators list (only my user account), and after restarting (not
>> running with -t create), still only root has permissions to do
>> anything.  It doesn't seem to matter what is in the managers or
>> operators list, only root can do anything (even if root isn't in the
>> list, which is not the behavior I've seen elsewhere).
>>
>> I've never had this problem with a torque install before...
>>
>> --Jim
>>
>> On Thu, Dec 10, 2009 at 3:52 PM, Tom Rudwick <tomr at intrinsity.com> wrote:
>>
>>>
>>> I recommend that everywhere you use your server name, you use
>>> the FQDN version. Also, check that in your /etc/hosts file on
>>> the server that it's FQDN is listed first on the line. The top
>>> of your hosts file would look something like this:
>>>
>>> # required host names and addresses
>>>
>>> # Do not remove the following line, or various programs
>>> # that require network functionality will fail.
>>>
>>> 127.0.0.1               localhost.localdomain localhost
>>>
>>> # Same goes for the next line, which refers to this system
>>>
>>> nn.nn.nn.nn          isp-curran.isp.wsu.edu isp-curran
>>>
>>> In other words, don't use an alias anywhere in the setup.
>>> I've seen problems with torque when it is set up any other way.
>>>
>>> Tom
>>>
>>>
>>> Jim Kusznir wrote:
>>>
>>>>
>>>> After recompiling torque with some patches provided from the rpm
>>>> maintainer that fixed the issues that required the
>>>> --ignore-gcc-warnings flag, maui was seeing the jobs from torque, but
>>>> not able to execute.  Presently, showq actually shows all the jobs,
>>>> but they're deferred due to maui not being able to control torque.  It
>>>> also turns out that my regular user account (kusznir) is also unable
>>>> to control torque on this new install, even though its in the host
>>>> list (root is allowed for some reason).
>>>>
>>>> I've checked the logs, and it shows user at fqdn no tauthorized, but p s
>>>> shows that exact same user at fqdn in the managers' list.  This really
>>>> has me confused:
>>>> 12/10/2009 14:01:35;0080;PBS_Server;Req;req_reject;Reject reply
>>>> code=15007(Unauthorized Request ), aux=0, type=RunJob, from
>>>> kusznir at isp-curran.isp.wsu.edu
>>>> 12/10/2009
>>>> 12:25:58;0020;PBS_Server;Job;1.isp-curran.isp.wsu.edu;Unauthorized
>>>> Request, request type: 11, Object: Job, Name:
>>>> 1.isp-curran.isp.wsu.edu, request from: maui at isp-curran.isp.wsu.edu
>>>> 12/10/2009 12:25:58;0080;PBS_Server;Req;req_reject;Reject reply
>>>> code=15007(Unauthorized Request  MSG=operation not permitted), aux=0,
>>>> type=ModifyJob, from maui at isp-curran.isp.wsu.edu
>>>>
>>>> yet:
>>>>
>>>> kusznir at isp-curran:/opt/torque/server_logs> qmgr -c 'p s'
>>>> #
>>>> # Create queues and set their attributes.
>>>> #
>>>> #
>>>> # Create and define queue default
>>>> #
>>>> create queue default
>>>> set queue default queue_type = Execution
>>>> set queue default resources_default.nodes = 1
>>>> set queue default resources_default.walltime = 01:00:00
>>>> set queue default enabled = True
>>>> set queue default started = True
>>>> #
>>>> # Set server attributes.
>>>> #
>>>> set server scheduling = True
>>>> set server acl_hosts = isp-curran
>>>> set server managers = kusznir at isp-curran.isp.wsu.edu
>>>> set server managers += maui at isp-curran.isp.wsu.edu
>>>> set server managers += root at isp-curran.isp.wsu.edu
>>>> set server default_queue = default
>>>> set server log_events = 511
>>>> set server mail_from = torque at isp-curran.isp.wsu.edu
>>>> set server scheduler_iteration = 600
>>>> set server node_check_rate = 150
>>>> set server tcp_timeout = 6
>>>> set server next_job_number = 1
>>>>
>>>> I've checked:
>>>>
>>>> 1) in /etc/host, the IP address mapps to both isp-curran and
>>>> isp-curran.isp.wsu.edu
>>>> 2) host isp-curran.isp.wsu.edu does resolve to the IP address
>>>> 3) host isp-curran also resolves
>>>> 4) host on the ip resolves to the fqdn.
>>>>
>>>> I don't see any way this can be a dns issue, as the host file is
>>>> correct, and in the log file, the entries have already been resolved
>>>> to hostnames (eg, you can see it already knows its
>>>> kusznir at isp-curran.isp.wsu.edu, or maui at isp-curran.isp.wsu.edu).  What
>>>> really confuses me is it appears the exact same user at host is both in
>>>> the logs as not allowed and in the managers line in qmgr.  I also
>>>> don't understand why root can run commands, but maui and kusznir
>>>> cannot, when they're all in the list in the same manor.
>>>>
>>>> Oh, I also tried changing the server_acl_hosts to
>>>> isp-curran.isp.wsu.edu; no change there.
>>>>
>>>> I tried changing the managers to @*, but that also had no impact.  I
>>>> also tried setting set server acl_host_enable = False, but that also
>>>> had no impact (this machine is behind a tight firewall, so there's not
>>>> much risk of other users on the network trying to do stuff...there's
>>>> only 1 machine on this "network").
>>>>
>>>> I'd appreciate any input.  This machine has been down for several days
>>>> now, and the users are getting out their pitchforks.....
>>>>
>>>> --Jim
>>>> _______________________________________________
>>>> torqueusers mailing list
>>>> torqueusers at supercluster.org
>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>
>>>>
>>>
>>>
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>
>


More information about the torqueusers mailing list