[Mauiusers] [torqueusers] Maui-Torque integration problems
Jim Kusznir
jkusznir at gmail.com
Wed Dec 9 18:49:19 MST 2009
I just completely replaced my torque install with a fresh build using
the RPM, getting the home dir in /opt/torque with the rest of the
torque stuff, and all. I reconfigured torque from scratch as part of
the process, but still no go. Here is a summary of all my configs:
torque built with the included spec file, mods in my last e-mail
included. Final configure_args statement:
%define configure_args --disable-gcc-warnings --prefix=/opt/torque
--with-server-home=/opt/torque --without-tcl
kusznir at isp-curran:/opt/torque> qmgr -c 'p s'
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = isp-curran
set server managers = kusznir at isp-curran.isp.wsu.edu
set server managers += maui at isp-curran.isp.wsu.edu
set server managers += root at isp-curran.isp.wsu.edu
set server operators = kusznir at isp-curran.isp.wsu.edu
set server operators += maui at isp-curran.isp.wsu.edu
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server mom_job_sync = True
set server keep_completed = 300
set server next_job_number = 6
kusznir at isp-curran:/opt/torque> qstat
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
2.isp-curran STDIN kusznir 0 Q
batch
3.isp-curran STDIN kusznir 0 Q
batch
4.isp-curran STDIN kusznir 0 Q
batch
5.isp-curran STDIN kusznir 0 Q
batch
kusznir at isp-curran:/opt/torque> showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
0 Active Jobs 0 of 256 Processors Active (0.00%)
0 of 1 Nodes Active (0.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
0 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 0 Active Jobs: 0 Idle Jobs: 0 Blocked Jobs: 0
kusznir at isp-curran:/opt/torque> diagnose -j 5
Name State Par Proc QOS WCLimit R Min User
Group Account QueuedTime Network Opsys Arch Mem Disk
Procs Class Features
kusznir at isp-curran:/opt/torque> checkjob 5
ERROR: 'checkjob' failed
ERROR: cannot locate job '5'
kusznir at isp-curran:/opt/maui> cat maui.cfg
# maui.cfg 3.2.6p20
SERVERHOST isp-curran
# primary admin must be first in list
ADMIN1 maui root kusznir
# Resource Manager Definition
RMCFG[isp-curran] TYPE=PBS HOST=isp-curran.isp.wsu.edu
# Allocation Manager Definition
AMCFG[bank] TYPE=NONE
# full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
# use the 'schedctl -l' command to display current configuration
RMPOLLINTERVAL 00:00:30
SERVERPORT 42559
SERVERMODE NORMAL
# Admin: http://supercluster.org/mauidocs/a.esecurity.html
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3
# Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
QUEUETIMEWEIGHT 1
# Throttling Policies:
http://supercluster.org/mauidocs/6.2throttlingpolicies.html
# NONE SPECIFIED
# Backfill: http://supercluster.org/mauidocs/8.2backfill.html
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
# Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
NODEALLOCATIONPOLICY MINRESOURCE
maui log startup through first loop attached as "sample log". Here is
the torque log from startup through present:
12/09/2009 17:02:10;0002;PBS_Server;Svr;Log;Log opened
12/09/2009 17:02:10;0006;PBS_Server;Svr;PBS_Server;Server
isp-curran.isp.wsu.edu started, initialization type = 1
12/09/2009 17:02:10;0002;PBS_Server;Svr;Act;Account file
/opt/torque/server_priv/accounting/20091209 opened
12/09/2009 17:02:10;0040;PBS_Server;Req;setup_nodes;setup_nodes()
12/09/2009 17:02:10;0086;PBS_Server;Svr;PBS_Server;Recovered queue batch
12/09/2009 17:02:10;0002;PBS_Server;Svr;PBS_Server;Expected 1,
recovered 1 queues
12/09/2009 17:02:10;0002;PBS_Server;Svr;PBS_Server;Expected 0, recovered 0 jobs
12/09/2009 17:02:10;0006;PBS_Server;Svr;PBS_Server;Using ports
Server:15001 Scheduler:15004 MOM:15002 (server:
'isp-curran.isp.wsu.edu')
12/09/2009 17:02:10;0002;PBS_Server;Svr;PBS_Server;Server Ready, pid =
128752, loglevel=0
12/09/2009 17:02:10;0004;PBS_Server;Svr;WARNING;ALERT: unable to
contact node isp-curran
12/09/2009 17:02:15;0040;PBS_Server;Req;ping_nodes;ping attempting to
contact 1 nodes
12/09/2009 17:02:15;0040;PBS_Server;Req;ping_nodes;successful ping to
node isp-curran (stream 0)
12/09/2009 17:02:15;0002;PBS_Server;Svr;PBS_Server;Torque Server
Version = 2.4.2, loglevel = 0
12/09/2009 17:02:55;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::stream_eof,
connection to isp-curran is bad, remote service may be down, message
may be corrupt, or connection may have been dropped remotely
(Premature end of message). setting node state to down
12/09/2009 17:07:15;0002;PBS_Server;Svr;PBS_Server;Torque Server
Version = 2.4.2, loglevel = 0
12/09/2009 17:12:15;0002;PBS_Server;Svr;PBS_Server;Torque Server
Version = 2.4.2, loglevel = 0
12/09/2009 17:17:15;0002;PBS_Server;Svr;PBS_Server;Torque Server
Version = 2.4.2, loglevel = 0
12/09/2009 17:21:17;0100;PBS_Server;Job;2.isp-curran.isp.wsu.edu;enqueuing
into batch, state 1 hop 1
12/09/2009 17:21:17;0008;PBS_Server;Job;2.isp-curran.isp.wsu.edu;Job
Queued at request of kusznir at isp-curran.isp.wsu.edu, owner =
kusznir at isp-curran.isp.wsu.edu, job name = STDIN, queue = batch
12/09/2009 17:21:17;0040;PBS_Server;Svr;isp-curran.isp.wsu.edu;Scheduler
was sent the command scheduler_first
12/09/2009 17:21:53;0100;PBS_Server;Job;3.isp-curran.isp.wsu.edu;enqueuing
into batch, state 1 hop 1
12/09/2009 17:21:53;0008;PBS_Server;Job;3.isp-curran.isp.wsu.edu;Job
Queued at request of kusznir at isp-curran.isp.wsu.edu, owner =
kusznir at isp-curran.isp.wsu.edu, job name = STDIN, queue = batch
12/09/2009 17:21:53;0040;PBS_Server;Svr;isp-curran.isp.wsu.edu;Scheduler
was sent the command new
12/09/2009 17:21:55;0100;PBS_Server;Job;4.isp-curran.isp.wsu.edu;enqueuing
into batch, state 1 hop 1
12/09/2009 17:21:55;0008;PBS_Server;Job;4.isp-curran.isp.wsu.edu;Job
Queued at request of kusznir at isp-curran.isp.wsu.edu, owner =
kusznir at isp-curran.isp.wsu.edu, job name = STDIN, queue = batch
12/09/2009 17:21:55;0040;PBS_Server;Svr;isp-curran.isp.wsu.edu;Scheduler
was sent the command new
12/09/2009 17:21:56;0100;PBS_Server;Job;5.isp-curran.isp.wsu.edu;enqueuing
into batch, state 1 hop 1
12/09/2009 17:21:56;0008;PBS_Server;Job;5.isp-curran.isp.wsu.edu;Job
Queued at request of kusznir at isp-curran.isp.wsu.edu, owner =
kusznir at isp-curran.isp.wsu.edu, job name = STDIN, queue = batch
12/09/2009 17:21:56;0040;PBS_Server;Svr;isp-curran.isp.wsu.edu;Scheduler
was sent the command new
12/09/2009 17:22:15;0002;PBS_Server;Svr;PBS_Server;Torque Server
Version = 2.4.2, loglevel = 0
12/09/2009 17:27:15;0002;PBS_Server;Svr;PBS_Server;Torque Server
Version = 2.4.2, loglevel = 0
12/09/2009 17:31:56;0040;PBS_Server;Svr;isp-curran.isp.wsu.edu;Scheduler
was sent the command time
12/09/2009 17:32:15;0002;PBS_Server;Svr;PBS_Server;Torque Server
Version = 2.4.2, loglevel = 0
12/09/2009 17:37:15;0002;PBS_Server;Svr;PBS_Server;Torque Server
Version = 2.4.2, loglevel = 0
12/09/2009 17:41:56;0040;PBS_Server;Svr;isp-curran.isp.wsu.edu;Scheduler
was sent the command time
12/09/2009 17:42:15;0002;PBS_Server;Svr;PBS_Server;Torque Server
Version = 2.4.2, loglevel = 0
---------------
I'm really dumbfoudned by this problem...I've never encoutered this
before. I don't know how I can debug this any further without digging
into the source code...Which I don't think I should have to do to run
a "standard" torque+maui configuration..... I'd really appreciate any
help in this.
Thanks!
--Jim
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample.log
Type: text/x-log
Size: 116644 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20091209/36973920/attachment-0001.bin
More information about the mauiusers
mailing list