[torqueusers] [Mauiusers] Maui-Torque integration problems

Jim Kusznir jkusznir at gmail.com
Fri Dec 11 11:12:28 MST 2009


Unfortunately, all that has been done already.

The more I play with it, the more it seems that torque is hard-coded
to only accept root.  Right now, root isn't even in the managers or
operators list (only my user account), and after restarting (not
running with -t create), still only root has permissions to do
anything.  It doesn't seem to matter what is in the managers or
operators list, only root can do anything (even if root isn't in the
list, which is not the behavior I've seen elsewhere).

I've never had this problem with a torque install before...

--Jim

On Thu, Dec 10, 2009 at 3:52 PM, Tom Rudwick <tomr at intrinsity.com> wrote:
> I recommend that everywhere you use your server name, you use
> the FQDN version. Also, check that in your /etc/hosts file on
> the server that it's FQDN is listed first on the line. The top
> of your hosts file would look something like this:
>
> # required host names and addresses
>
> # Do not remove the following line, or various programs
> # that require network functionality will fail.
>
> 127.0.0.1               localhost.localdomain localhost
>
> # Same goes for the next line, which refers to this system
>
> nn.nn.nn.nn          isp-curran.isp.wsu.edu isp-curran
>
> In other words, don't use an alias anywhere in the setup.
> I've seen problems with torque when it is set up any other way.
>
> Tom
>
>
> Jim Kusznir wrote:
>>
>> After recompiling torque with some patches provided from the rpm
>> maintainer that fixed the issues that required the
>> --ignore-gcc-warnings flag, maui was seeing the jobs from torque, but
>> not able to execute.  Presently, showq actually shows all the jobs,
>> but they're deferred due to maui not being able to control torque.  It
>> also turns out that my regular user account (kusznir) is also unable
>> to control torque on this new install, even though its in the host
>> list (root is allowed for some reason).
>>
>> I've checked the logs, and it shows user at fqdn no tauthorized, but p s
>> shows that exact same user at fqdn in the managers' list.  This really
>> has me confused:
>> 12/10/2009 14:01:35;0080;PBS_Server;Req;req_reject;Reject reply
>> code=15007(Unauthorized Request ), aux=0, type=RunJob, from
>> kusznir at isp-curran.isp.wsu.edu
>> 12/10/2009
>> 12:25:58;0020;PBS_Server;Job;1.isp-curran.isp.wsu.edu;Unauthorized
>> Request, request type: 11, Object: Job, Name:
>> 1.isp-curran.isp.wsu.edu, request from: maui at isp-curran.isp.wsu.edu
>> 12/10/2009 12:25:58;0080;PBS_Server;Req;req_reject;Reject reply
>> code=15007(Unauthorized Request  MSG=operation not permitted), aux=0,
>> type=ModifyJob, from maui at isp-curran.isp.wsu.edu
>>
>> yet:
>>
>> kusznir at isp-curran:/opt/torque/server_logs> qmgr -c 'p s'
>> #
>> # Create queues and set their attributes.
>> #
>> #
>> # Create and define queue default
>> #
>> create queue default
>> set queue default queue_type = Execution
>> set queue default resources_default.nodes = 1
>> set queue default resources_default.walltime = 01:00:00
>> set queue default enabled = True
>> set queue default started = True
>> #
>> # Set server attributes.
>> #
>> set server scheduling = True
>> set server acl_hosts = isp-curran
>> set server managers = kusznir at isp-curran.isp.wsu.edu
>> set server managers += maui at isp-curran.isp.wsu.edu
>> set server managers += root at isp-curran.isp.wsu.edu
>> set server default_queue = default
>> set server log_events = 511
>> set server mail_from = torque at isp-curran.isp.wsu.edu
>> set server scheduler_iteration = 600
>> set server node_check_rate = 150
>> set server tcp_timeout = 6
>> set server next_job_number = 1
>>
>> I've checked:
>>
>> 1) in /etc/host, the IP address mapps to both isp-curran and
>> isp-curran.isp.wsu.edu
>> 2) host isp-curran.isp.wsu.edu does resolve to the IP address
>> 3) host isp-curran also resolves
>> 4) host on the ip resolves to the fqdn.
>>
>> I don't see any way this can be a dns issue, as the host file is
>> correct, and in the log file, the entries have already been resolved
>> to hostnames (eg, you can see it already knows its
>> kusznir at isp-curran.isp.wsu.edu, or maui at isp-curran.isp.wsu.edu).  What
>> really confuses me is it appears the exact same user at host is both in
>> the logs as not allowed and in the managers line in qmgr.  I also
>> don't understand why root can run commands, but maui and kusznir
>> cannot, when they're all in the list in the same manor.
>>
>> Oh, I also tried changing the server_acl_hosts to
>> isp-curran.isp.wsu.edu; no change there.
>>
>> I tried changing the managers to @*, but that also had no impact.  I
>> also tried setting set server acl_host_enable = False, but that also
>> had no impact (this machine is behind a tight firewall, so there's not
>> much risk of other users on the network trying to do stuff...there's
>> only 1 machine on this "network").
>>
>> I'd appreciate any input.  This machine has been down for several days
>> now, and the users are getting out their pitchforks.....
>>
>> --Jim
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>
>


More information about the torqueusers mailing list