[torquedev] Torque 2.3.0 + GSSAPI problem

Enrico Morelli enrico.morelli at gmail.com
Thu Feb 28 01:25:41 MST 2008


Dear all,

Excuse me for the long message I'm trying to explain well my problem.

To use OpenAFS and Kerberos I'm trying to use the svn version of torque with
gssapi support. The compilation ends fine.

When I start the server I see the following log:

Feb 27 12:33:25 v6-enmr PBS_Server: No such file or directory (2) in
job_recov, Unable to read /var/spool/pbs/server_priv/jobs/5.v6-
enmr.cerm.unifi.it.JB
Feb 27 12:33:25 v6-enmr PBS_Server: pbsd_init, Recover of job
5.v6-enmr.cerm.unifi.it.JB failed
Feb 27 12:33:25 v6-enmr PBS_Server: Connection refused (111) in
contact_sched, Could not contact Scheduler - port 15004 cannot bind to port
1023 in client_to_svr - connection refused
Feb 27 12:33:25 v6-enmr pbsserver: pbs_server startup succeeded
Feb 27 12:33:25 v6-enmr su(pam_unix)[30128]: session opened for user maui by
root(uid=0)
Feb 27 12:33:25 v6-enmr su(pam_unix)[30128]: session closed for user maui
Feb 27 12:33:25 v6-enmr su(pam_unix)[30131]: session opened for user maui by
root(uid=0)
Feb 27 12:33:26 v6-enmr su: INFO:     starting Maui version
3.2.6p20##################
Feb 27 12:33:26 v6-enmr su: INFO:     new LOGLEVEL value (3)
Feb 27 12:33:26 v6-enmr su: INFO:     detected array index '0'
Feb 27 12:33:26 v6-enmr su: MCfgProcessLine(RMHOST,0,v6-enmr.cerm.unifi.it)
Feb 27 12:33:26 v6-enmr su: MCfgProcessLine(RMPOLLINTERVAL,,00:00:10)
Feb 27 12:33:26 v6-enmr su:
MCfgSetVal(RMPOLLINTERVAL,IVal,DVal,SVal,SArray,P)
Feb 27 12:33:26 v6-enmr su: MUTimeFromString(00:00:10)
Feb 27 12:33:26 v6-enmr su: INFO:     detected array index '0'
Feb 27 12:33:26 v6-enmr su: MCfgProcessLine(RMHOST,0,v6-enmr.cerm.unifi.it)
Feb 27 12:33:26 v6-enmr su(pam_unix)[30131]: session closed for user maui
Feb 27 12:33:26 v6-enmr su: INFO:     detected array index '0'
Feb 27 12:33:26 v6-enmr su: MCfgProcessLine(RMTYPE,0,PBS)
Feb 27 12:33:26 v6-enmr su: MUGetIndex(PBS,ValList,0)
Feb 27 12:33:26 v6-enmr su: MCfgProcessLine(SERVERHOST,,
v6-enmr.cerm.unifi.it)
Feb 27 12:33:26 v6-enmr su: MCfgSetVal(SERVERHOST,IVal,DVal,SVal,SArray,P)
Feb 27 12:33:26 v6-enmr su: INFO:     starting scheduler on '
v6-enmr.cerm.unifi.it'
Feb 27 12:33:26 v6-enmr su: MCfgProcessLine(SERVERMODE,,NORMAL)
Feb 27 12:33:26 v6-enmr su: MCfgSetVal(SERVERMODE,IVal,DVal,SVal,SArray,P)
Feb 27 12:33:26 v6-enmr su: MUGetIndex(NORMAL,ValList,1)
Feb 27 12:33:26 v6-enmr su: MCfgProcessLine(SERVERPORT,,40559)
Feb 27 12:33:26 v6-enmr su: MCfgSetVal(SERVERPORT,IVal,DVal,SVal,SArray,P)
Feb 27 12:33:26 v6-enmr su: MAMSetDefaults()
Feb 27 12:33:26 v6-enmr su: ServerProcessArgs(1,ArgV,0)
Feb 27 12:33:26 v6-enmr su:
MUGetOpt(1,ArgV,a:Ab:B:c:C:dD:f:hH:i:j:l:L:m:n:N:p:P:r:s:v?-:,OptArg)
Feb 27 12:33:26 v6-enmr su: ServerDemonize()
Feb 27 12:33:26 v6-enmr su: INFO:     child process in background
Feb 27 12:33:26 v6-enmr su: ServerAuthenticate()
Feb 27 12:33:26 v6-enmr su:
MFULock(/var/spool/maui/,/var/spool/maui/maui.pid)
Feb 27 12:33:26 v6-enmr su: INFO:     executing scheduler from
'/var/spool/maui/' under UID 7721 GID 7721
Feb 27 12:33:26 v6-enmr su: SDRGetSystemConfig()
Feb 27 12:33:26 v6-enmr su: MSysStartServer()
Feb 27 12:33:26 v6-enmr su: starting 3.2.6p20 version Maui (PID: 30132) on
Wed Feb 27 12:33:25
Feb 27 12:33:26 v6-enmr su: MSysMemCheck()
Feb 27 12:33:26 v6-enmr su: MNode[5120]               0.02
Feb 27 12:33:26 v6-enmr su: MJob[32768]                0.12
Feb 27 12:33:26 v6-enmr su: MJobTraceBuffer[32768]     0.00
Feb 27 12:33:26 v6-enmr su: MUser[1792]               0.01
Feb 27 12:33:26 v6-enmr su: MGroup[1792]              2.06
Feb 27 12:33:26 v6-enmr su: MAcct[1792]               2.06
Feb 27 12:33:27 v6-enmr su: MRes[8192]                0.03
Feb 27 12:33:27 v6-enmr su: SRes[ 128]                2.39
Feb 27 12:33:27 v6-enmr su: MStatInitialize(P)
Feb 27 12:33:27 v6-enmr su: MStatProfInitialize(P)
Feb 27 12:33:27 v6-enmr su: MStatOpenFile(1204112005)
Feb 27 12:33:27 v6-enmr su: WARNING:  cannot open statfile
'/var/spool/maui/stats/Wed_Feb_27_2008', errno: 13 (Permission denied)
Feb 27 12:33:27 v6-enmr su: VERSION 230
Feb 27 12:33:27 v6-enmr su: MSUListen(S)
Feb 27 12:33:27 v6-enmr su: INFO:     opened service socket on port 40559
Feb 27 12:33:27 v6-enmr su: MSUListen(S)
Feb 27 12:33:27 v6-enmr su: INFO:     opened service socket on port 40560
Feb 27 12:33:27 v6-enmr su: MFSInitialize()
Feb 27 12:33:27 v6-enmr su: MCPLoad(/var/spool/maui/maui.ck,ResOnly)
Feb 27 12:33:27 v6-enmr su: MRMInitialize()
Feb 27 12:33:27 v6-enmr su: MPBSInitialize(0,SC)
Feb 27 12:33:27 v6-enmr su: INFO:     parent is exiting
Feb 27 12:33:27 v6-enmr pbsserver: su startup succeeded


There is a "Connection refused" that I don't understand.

The infos from checknode seems correct:

checking node wn5-enmr.cerm.unifi.it

State:      Idle  (in current state for 00:01:06)
Configured Resources: PROCS: 1  MEM: 1024M  SWAP: 3004M  DISK: 1M
Utilized   Resources: [NONE]
Dedicated  Resources: [NONE]
Opsys:         linux  Arch:      [NONE]
Speed:      1.00  Load:       0.000
Network:    [DEFAULT]
Features:   [NONE]
Attributes: [Batch]
Classes:    [batch 1:1]

Total Time: 00:01:15  Up: 00:01:04 (85.33%)  Active: 00:00:00 (0.00%)

Reservations:
NOTE:  no reservations on node



Also the infos from showq seems correct:

ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING
STARTTIME


     0 Active Jobs       0 of    1 Processors Active (0.00%)

IDLE JOBS----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT
QUEUETIME


0 Idle Jobs

BLOCKED JOBS----------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT
QUEUETIME


Total Jobs: 0   Active Jobs: 0   Idle Jobs: 0   Blocked Jobs: 0


But when I try to submit a job (qsub pbsrun -q batch) I receive:
qsub: Unknown queue MSG=cannot save creds

and pbs_server died without messages.

The qmgr prints:
Qmgr: p s
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch acl_groups = users
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = wn5-enmr.cerm.unifi.it
set server acl_hosts += v6-enmr.cerm.unifi.it
set server managers = afsadm/admin at CERM.UNIFI.IT
set server operators = afsadm/admin at CERM.UNIFI.IT
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server mom_job_sync = True
set server keep_completed = 300
set server next_job_number = 7


Someone has any idea?

Thanks a lot
Enrico
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20080228/b4810482/attachment.html


More information about the torquedev mailing list