[torqueusers] [Mauiusers] Maui-Torque integration problems

Jim Kusznir jkusznir at gmail.com
Fri Dec 11 14:53:04 MST 2009


So, for proof of concept, I changed my maui ADMIN1 Line to have root
first, so maui runs as root.  Sure enough, as soon as its started up,
all the queued jobs immediately run.  maui and torque are working.

This solidifies the fact that the problem is with torque's permissions
system...For whatever reason, reguardless of the specification of
managers/operators, ONLY root can control torque.  Period.  That is
the problem that needs to be addressed....and I have no solution for.
It works everywhere else for me on all my other systems.  Perhaps I
need to try an older version of torque...

suggestions?

--Jim

On Fri, Dec 11, 2009 at 12:42 PM, Jim Kusznir <jkusznir at gmail.com> wrote:
> Thanks for the suggestion.  My server_priv was 750, I changed it to
> 755, but no change.
>
> all the other permissions were the same,e xcept I did not have the
> files nodes and nodes_status yet (still trying to make basic queue
> management work correctly).
>
> My test was to run qmgr as kusznir and try set server managers +=
> maui at isp-curran.isp.wsu.edu
>
> I got qmgr obj= svr=default: Unauthorized Request back
>
> --Jim
>
> On Fri, Dec 11, 2009 at 10:26 AM, Scott L. Hamilton
> <hamilton.mst at gmail.com> wrote:
>> Jim,
>>
>> I would suggest looking at the file permission on the server_priv folder.  I
>> had file permissions messed up on a troque install once and only root had
>> access to the files which made the server fail in several strange ways.
>> Here is the permission tree on my installation for comparison.
>>
>> [root at nic-p1 server_priv]# ls -al
>> total 156
>> drwxr-xr-x 12 root root  4096 Nov 10 11:03 .
>> drwxr-xr-x 18 root root  4096 May 21  2009 ..
>> drwxr-xr-x  2 root root 36864 Dec 11 00:37 accounting
>> drwxr-x---  2 root root  4096 May 16  2008 acl_groups
>> drwxr-x---  2 root root  4096 May 16  2008 acl_hosts
>> drwxr-x---  2 root root  4096 Nov 10 11:03 acl_svr
>> drwxr-x---  2 root root  4096 May 16  2008 acl_users
>> drwxr-x---  2 root root  4096 May 16  2008 arrays
>> drwxr-x---  2 root root  4096 May 16  2008 disallowed_types
>> drwxr-x---  2 root root  4096 May 16  2008 hostlist
>> drwxr-x---  2 root root 61440 Dec 11 12:02 jobs
>> -rw-r--r--  1 root root  2925 Nov  9 16:05 nodes
>> -rw-r--r--  1 root root    15 Dec  7 13:45 node_status
>> drwxr-x---  2 root root  4096 May 16  2008 queues
>> -rw-------  1 root root  1902 Dec 11 10:56 serverdb
>> -rw-------  1 root root     6 Nov 10 11:03 server.lock
>> -rw-------  1 root root     0 Sep  5  2008 tracking
>>
>> I don't know if it will fix your issue or not, but it couldn't hurt to try
>> it.
>>
>> Scott
>>
>>
>> Jim Kusznir wrote:
>>>
>>> Unfortunately, all that has been done already.
>>>
>>> The more I play with it, the more it seems that torque is hard-coded
>>> to only accept root.  Right now, root isn't even in the managers or
>>> operators list (only my user account), and after restarting (not
>>> running with -t create), still only root has permissions to do
>>> anything.  It doesn't seem to matter what is in the managers or
>>> operators list, only root can do anything (even if root isn't in the
>>> list, which is not the behavior I've seen elsewhere).
>>>
>>> I've never had this problem with a torque install before...
>>>
>>> --Jim
>>>
>>> On Thu, Dec 10, 2009 at 3:52 PM, Tom Rudwick <tomr at intrinsity.com> wrote:
>>>
>>>>
>>>> I recommend that everywhere you use your server name, you use
>>>> the FQDN version. Also, check that in your /etc/hosts file on
>>>> the server that it's FQDN is listed first on the line. The top
>>>> of your hosts file would look something like this:
>>>>
>>>> # required host names and addresses
>>>>
>>>> # Do not remove the following line, or various programs
>>>> # that require network functionality will fail.
>>>>
>>>> 127.0.0.1               localhost.localdomain localhost
>>>>
>>>> # Same goes for the next line, which refers to this system
>>>>
>>>> nn.nn.nn.nn          isp-curran.isp.wsu.edu isp-curran
>>>>
>>>> In other words, don't use an alias anywhere in the setup.
>>>> I've seen problems with torque when it is set up any other way.
>>>>
>>>> Tom
>>>>
>>>>
>>>> Jim Kusznir wrote:
>>>>
>>>>>
>>>>> After recompiling torque with some patches provided from the rpm
>>>>> maintainer that fixed the issues that required the
>>>>> --ignore-gcc-warnings flag, maui was seeing the jobs from torque, but
>>>>> not able to execute.  Presently, showq actually shows all the jobs,
>>>>> but they're deferred due to maui not being able to control torque.  It
>>>>> also turns out that my regular user account (kusznir) is also unable
>>>>> to control torque on this new install, even though its in the host
>>>>> list (root is allowed for some reason).
>>>>>
>>>>> I've checked the logs, and it shows user at fqdn no tauthorized, but p s
>>>>> shows that exact same user at fqdn in the managers' list.  This really
>>>>> has me confused:
>>>>> 12/10/2009 14:01:35;0080;PBS_Server;Req;req_reject;Reject reply
>>>>> code=15007(Unauthorized Request ), aux=0, type=RunJob, from
>>>>> kusznir at isp-curran.isp.wsu.edu
>>>>> 12/10/2009
>>>>> 12:25:58;0020;PBS_Server;Job;1.isp-curran.isp.wsu.edu;Unauthorized
>>>>> Request, request type: 11, Object: Job, Name:
>>>>> 1.isp-curran.isp.wsu.edu, request from: maui at isp-curran.isp.wsu.edu
>>>>> 12/10/2009 12:25:58;0080;PBS_Server;Req;req_reject;Reject reply
>>>>> code=15007(Unauthorized Request  MSG=operation not permitted), aux=0,
>>>>> type=ModifyJob, from maui at isp-curran.isp.wsu.edu
>>>>>
>>>>> yet:
>>>>>
>>>>> kusznir at isp-curran:/opt/torque/server_logs> qmgr -c 'p s'
>>>>> #
>>>>> # Create queues and set their attributes.
>>>>> #
>>>>> #
>>>>> # Create and define queue default
>>>>> #
>>>>> create queue default
>>>>> set queue default queue_type = Execution
>>>>> set queue default resources_default.nodes = 1
>>>>> set queue default resources_default.walltime = 01:00:00
>>>>> set queue default enabled = True
>>>>> set queue default started = True
>>>>> #
>>>>> # Set server attributes.
>>>>> #
>>>>> set server scheduling = True
>>>>> set server acl_hosts = isp-curran
>>>>> set server managers = kusznir at isp-curran.isp.wsu.edu
>>>>> set server managers += maui at isp-curran.isp.wsu.edu
>>>>> set server managers += root at isp-curran.isp.wsu.edu
>>>>> set server default_queue = default
>>>>> set server log_events = 511
>>>>> set server mail_from = torque at isp-curran.isp.wsu.edu
>>>>> set server scheduler_iteration = 600
>>>>> set server node_check_rate = 150
>>>>> set server tcp_timeout = 6
>>>>> set server next_job_number = 1
>>>>>
>>>>> I've checked:
>>>>>
>>>>> 1) in /etc/host, the IP address mapps to both isp-curran and
>>>>> isp-curran.isp.wsu.edu
>>>>> 2) host isp-curran.isp.wsu.edu does resolve to the IP address
>>>>> 3) host isp-curran also resolves
>>>>> 4) host on the ip resolves to the fqdn.
>>>>>
>>>>> I don't see any way this can be a dns issue, as the host file is
>>>>> correct, and in the log file, the entries have already been resolved
>>>>> to hostnames (eg, you can see it already knows its
>>>>> kusznir at isp-curran.isp.wsu.edu, or maui at isp-curran.isp.wsu.edu).  What
>>>>> really confuses me is it appears the exact same user at host is both in
>>>>> the logs as not allowed and in the managers line in qmgr.  I also
>>>>> don't understand why root can run commands, but maui and kusznir
>>>>> cannot, when they're all in the list in the same manor.
>>>>>
>>>>> Oh, I also tried changing the server_acl_hosts to
>>>>> isp-curran.isp.wsu.edu; no change there.
>>>>>
>>>>> I tried changing the managers to @*, but that also had no impact.  I
>>>>> also tried setting set server acl_host_enable = False, but that also
>>>>> had no impact (this machine is behind a tight firewall, so there's not
>>>>> much risk of other users on the network trying to do stuff...there's
>>>>> only 1 machine on this "network").
>>>>>
>>>>> I'd appreciate any input.  This machine has been down for several days
>>>>> now, and the users are getting out their pitchforks.....
>>>>>
>>>>> --Jim
>>>>> _______________________________________________
>>>>> torqueusers mailing list
>>>>> torqueusers at supercluster.org
>>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>
>>
>>
>


More information about the torqueusers mailing list