[torqueusers] problems w/ mixed case domain names
Michael Hanulec
hanulec at hanulec.com
Wed Nov 17 00:29:49 MST 2004
Unfortunately this snapshot doesn't seem to solve the problem... but make
it worse. my working configuration, after modifying /etc/hosts on the
master node, is now broken. 'qstat', 'pbsnodes', 'qterm', & 'qmgr' all
fail (the qstat and pbsnodes failures are new):
[root at falcon00 server_logs]# qterm -t quick
pbs_iff: cannot connect to host
No Permission.
qterm: could not connect to server (15007)
[root at falcon00 server_logs]# qstat
pbs_iff: cannot connect to host
No Permission.
qstat: cannot connect to server falcon00 (errno=15007)
[root at falcon00 server_logs]# qmgr
pbs_iff: cannot connect to host
No Permission.
qmgr: cannot connect to server
[root at falcon00 server_logs]# ps -auwx|grep pbs
root 21394 0.0 0.0 9476 1376 ? S 01:25 0:00
/usr/local/pbs/sbin/pbs_server
root 21457 0.0 0.0 36960 700 pts/3 S 01:28 0:00 grep pbs
[root at falcon00 server_logs]#
I've verified a compute node and start its pbs_mom daemon and say HELLO
but this compute node also cannot execute qterm or pbsnodes.
What level of debugging output would be helpful in getting this resolved??
Thanks again!
--
hanulec at hanulec.com cell: 858.518.2647 && 516.410.4478
https://secure.hanulec.com EFnet irc && aol im: hanulec
On Tue, 16 Nov 2004, Dave Jackson wrote:
> Mike,
>
> We have modified authentication based host evaluation to be case
> insensitive in the latest TORQUE snapshot. Please give it a try and let
> us know if it solves your problems.
>
> Thanks,
> Dave
> Cluster Resources, Inc
>
> On Mon, 2004-11-15 at 20:15, Michael Hanulec wrote:
>> Hi Everybody...
>>
>> I'm current attempting to run torque-1.1.0p5-snap.1099755743 on an RHEL
>> 3/AMD64 based system. I might of found a bug... or maybe this is a know
>> issue. My server name is 'falcon00.Force' but when the pbs_server starts
>> the logs say 'falcon00.force':
>>
>> <begin pbs server log file>
>> 11/15/2004 20:28:45;0002;PBS_Server;Svr;Log;Log opened
>> 11/15/2004 20:28:45;0006;PBS_Server;Svr;PBS_Server;Server falcon00.force started, initialization type = 4
>> 11/15/2004 20:28:45;0002;PBS_Server;Svr;Act;Account file /var/spool/pbs/server_priv/accounting/20041115 opened
>> 11/15/2004 20:28:45;0040;PBS_Server;Req;setup_nodes;setup_nodes()
>> 11/15/2004 20:28:45;0004;PBS_Server;Svr;falcon00.force;No Node description file found in setup_nodes
>> 11/15/2004 20:28:45;0002;PBS_Server;Svr;PBS_Server;Expected 0, recovered 0 queues
>> 11/15/2004 20:28:45;0002;PBS_Server;Svr;PBS_Server;Expected 0, recovered 0 jobs
>> 11/15/2004 20:28:45;0006;PBS_Server;Svr;PBS_Server;Using ports Server:15001 Scheduler:15004 MOM:15002
>> 11/15/2004 20:28:45;0002;PBS_Server;Svr;PBS_Server;Server Ready, pid = 8075
>> </end pbs server log file>
>>
>>
>> My attempts to use 'qmgr' or 'qterm' fail though ... note the domain is
>> now listed as Force:
>>
>> <begin more pbs server log file>
>> 11/15/2004 20:28:57;0100;PBS_Server;Req;;Type authenticateuser request received from root at falcon00.Force, sock=10
>> 11/15/2004 20:29:02;0100;PBS_Server;Req;;Type manager request received from root at falcon00.Force, sock=9
>> 11/15/2004 20:29:02;0080;PBS_Server;Req;req_reject;Reject reply code=15007(Unauthorized Request ), aux=0, type=9, from root at falcon00.Force
>> </end more pbs server log file>
>>
>>
>> After changing my /etc/hosts entry to falcon00.force I am able to use both
>> qterm and qmgr:
>>
>> <final pbs server log file>
>> 11/15/2004 20:49:21;0100;PBS_Server;Req;;Type authenticateuser request received from root at falcon00.force, sock=16
>> 11/15/2004 20:49:21;0100;PBS_Server;Req;;Type shutdown request received from root at falcon00.force,sock=15
>> 11/15/2004 20:49:21;0086;PBS_Server;Svr;PBS_Server;Shutdown request from root at falcon00.force
>> 11/15/2004 20:49:21;0086;PBS_Server;Svr;PBS_Server;Starting to shutdown the server, type is Quick
>> 11/15/2004 20:49:21;0002;PBS_Server;Svr;PBS_Server;Server shutdown completed
>> 11/15/2004 20:49:21;0002;PBS_Server;Svr;Log;Log closed
>> </final pbs server log file>
>>
>>
>> Tonight I'm also going to try out 1.0.1p6 to see if this error exists
>> there too. Maybe I'll even dig up the code causing the problem.
>>
>> -Mike
>>
>> --
>> hanulec at hanulec.com cell: 858.518.2647 && 516.410.4478
>> https://secure.hanulec.com EFnet irc && aol im: hanulec
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://supercluster.org/mailman/listinfo/torqueusers
>
>
>
More information about the torqueusers
mailing list