Jim Kusznir jkusznir at gmail.com
Thu Dec 10 15:46:46 MST 2009

After recompiling torque with some patches provided from the rpm
maintainer that fixed the issues that required the
--ignore-gcc-warnings flag, maui was seeing the jobs from torque, but
not able to execute.  Presently, showq actually shows all the jobs,
but they're deferred due to maui not being able to control torque.  It
also turns out that my regular user account (kusznir) is also unable
to control torque on this new install, even though its in the host
list (root is allowed for some reason).

I've checked the logs, and it shows user at fqdn no tauthorized, but p s
shows that exact same user at fqdn in the managers' list.  This really
has me confused:
12/10/2009 14:01:35;0080;PBS_Server;Req;req_reject;Reject reply
code=15007(Unauthorized Request ), aux=0, type=RunJob, from
kusznir at isp-curran.isp.wsu.edu
12/10/2009 12:25:58;0020;PBS_Server;Job;1.isp-curran.isp.wsu.edu;Unauthorized
Request, request type: 11, Object: Job, Name:
1.isp-curran.isp.wsu.edu, request from: maui at isp-curran.isp.wsu.edu
12/10/2009 12:25:58;0080;PBS_Server;Req;req_reject;Reject reply
code=15007(Unauthorized Request  MSG=operation not permitted), aux=0,
type=ModifyJob, from maui at isp-curran.isp.wsu.edu


kusznir at isp-curran:/opt/torque/server_logs> qmgr -c 'p s'
# Create queues and set their attributes.
# Create and define queue default
create queue default
set queue default queue_type = Execution
set queue default resources_default.nodes = 1
set queue default resources_default.walltime = 01:00:00
set queue default enabled = True
set queue default started = True
# Set server attributes.
set server scheduling = True
set server acl_hosts = isp-curran
set server managers = kusznir at isp-curran.isp.wsu.edu
set server managers += maui at isp-curran.isp.wsu.edu
set server managers += root at isp-curran.isp.wsu.edu
set server default_queue = default
set server log_events = 511
set server mail_from = torque at isp-curran.isp.wsu.edu
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server next_job_number = 1

I've checked:

1) in /etc/host, the IP address mapps to both isp-curran and
2) host isp-curran.isp.wsu.edu does resolve to the IP address
3) host isp-curran also resolves
4) host on the ip resolves to the fqdn.

I don't see any way this can be a dns issue, as the host file is
correct, and in the log file, the entries have already been resolved
to hostnames (eg, you can see it already knows its
kusznir at isp-curran.isp.wsu.edu, or maui at isp-curran.isp.wsu.edu).  What
really confuses me is it appears the exact same user at host is both in
the logs as not allowed and in the managers line in qmgr.  I also
don't understand why root can run commands, but maui and kusznir
cannot, when they're all in the list in the same manor.

Oh, I also tried changing the server_acl_hosts to
isp-curran.isp.wsu.edu; no change there.

I tried changing the managers to @*, but that also had no impact.  I
also tried setting set server acl_host_enable = False, but that also
had no impact (this machine is behind a tight firewall, so there's not
much risk of other users on the network trying to do stuff...there's
only 1 machine on this "network").

I'd appreciate any input.  This machine has been down for several days
now, and the users are getting out their pitchforks.....


