[torqueusers] head node and execution node installed on same machine and do not recognize each other

Guy Rapaport guy4261 at gmail.com
Mon Aug 13 06:19:43 MDT 2012


service pbs_mom restart
qterm -t quick
pbs_server
pbsnodes -a
pbs_server port is 15001
mom_service_port is 15002
mom_manager_port is 15003
? 15004
trqauthd port is 15005

netstat -tuplen

when restarting mom:
08/13/2012 18:08:19;0002;   pbs_mom;n/a;rm_request;shutdown
08/13/2012 18:08:19;0002;   pbs_mom;n/a;dep_cleanup;dependent cleanup
08/13/2012 18:08:19;0002;   pbs_mom;Svr;Log;Log closed
08/13/2012 18:08:21;0002;   pbs_mom;Svr;Log;Log opened
08/13/2012 18:08:21;0002;   pbs_mom;Svr;pbs_mom;Torque Mom Version = 4.1.0,
loglevel = 0
08/13/2012 18:08:21;0002;   pbs_mom;Svr;setpbsserver;biostation06
08/13/2012 18:08:21;0002;   pbs_mom;Svr;mom_server_add;server biostation06
added
08/13/2012 18:08:21;0002;   pbs_mom;n/a;initialize;independent
08/13/2012 18:08:21;0080;   pbs_mom;Svr;pbs_mom;before init_abort_jobs
08/13/2012 18:08:21;0002;   pbs_mom;Svr;pbs_mom;Is up
08/13/2012 18:08:21;0002;   pbs_mom;Svr;setup_program_environment;MOM
executable path and mtime at launch: /usr/local/sbin/pbs_mom 1344185749
08/13/2012 18:08:21;0002;   pbs_mom;Svr;pbs_mom;Torque Mom Version = 4.1.0,
loglevel = 0
08/13/2012 18:11:03;0001;   pbs_mom;Svr;pbs_mom;LOG_ERROR::Network is
unreachable (101) in tcp_connect_sockaddr, Failed when trying to open tcp
connection - connect() failed [rc = 15096] [addr = 132.72.216.159:15001]
08/13/2012 18:11:03;0001;   pbs_mom;Svr;pbs_mom;LOG_ERROR::Inappropriate
ioctl for device (25) in tcp_connect_sockaddr, cannot connect to port 8 in
socket_connect_addr - errno:101 Network is unreachable
08/13/2012 18:11:03;0001;
pbs_mom;Svr;pbs_mom;LOG_ERROR::mom_server_update_stat, Cannot get a valid
stream to send update to server 'biostation06'
08/13/2012 18:12:52;0002;   pbs_mom;n/a;rm_request;shutdown
08/13/2012 18:12:52;0002;   pbs_mom;n/a;dep_cleanup;dependent cleanup
08/13/2012 18:12:52;0002;   pbs_mom;Svr;Log;Log closed
08/13/2012 18:12:54;0002;   pbs_mom;Svr;Log;Log opened
08/13/2012 18:12:54;0002;   pbs_mom;Svr;pbs_mom;Torque Mom Version = 4.1.0,
loglevel = 0
08/13/2012 18:12:54;0002;   pbs_mom;Svr;setpbsserver;biostation06
08/13/2012 18:12:54;0002;   pbs_mom;Svr;mom_server_add;server biostation06
added
08/13/2012 18:12:54;0002;   pbs_mom;n/a;initialize;independent
08/13/2012 18:12:54;0080;   pbs_mom;Svr;pbs_mom;before init_abort_jobs
08/13/2012 18:12:54;0002;   pbs_mom;Svr;pbs_mom;Is up
08/13/2012 18:12:54;0002;   pbs_mom;Svr;setup_program_environment;MOM
executable path and mtime at launch: /usr/local/sbin/pbs_mom 1344185749
08/13/2012 18:12:54;0002;   pbs_mom;Svr;pbs_mom;Torque Mom Version = 4.1.0,
loglevel = 0
n
when restarting headnode:
08/13/2012 17:33:16;0086;PBS_Server;Svr;PBS_Server;Shutdown request from
root at biostation06.bgu.ac.il
08/13/2012 17:33:16;0086;PBS_Server;Svr;PBS_Server;Starting to shutdown the
server, type is Quick
08/13/2012 17:33:17;0002;PBS_Server;Svr;PBS_Server;Server shutdown completed
08/13/2012 17:33:17;0002;PBS_Server;Svr;Log;Log closed
08/13/2012 17:33:19;0002;PBS_Server;Svr;Log;Log opened
08/13/2012 17:33:19;0006;PBS_Server;Svr;PBS_Server;Server
biostation06.bgu.ac.il started, initialization type = 1
08/13/2012 17:33:19;0002;PBS_Server;Svr;get_default_threads;Defaulting
min_threads to 49 threads
08/13/2012 17:33:19;0002;PBS_Server;Svr;Act;Account file
/var/spool/torque/server_priv/accounting/20120813 opened
08/13/2012 17:33:19;0040;PBS_Server;Req;setup_nodes;setup_nodes()
08/13/2012 17:33:19;0086;PBS_Server;Svr;PBS_Server;Recovered queue batch
08/13/2012 17:33:19;0002;PBS_Server;Svr;PBS_Server;Expected 1, recovered 1
queues
08/13/2012 17:33:19;0080;PBS_Server;Svr;PBS_Server;2 total files read from
disk
08/13/2012 17:33:19;0002;PBS_Server;Svr;PBS_Server;handle_job_recovery:3
08/13/2012 17:33:19;0006;PBS_Server;Svr;PBS_Server;Using ports
Server:15001  Scheduler:15004  MOM:15002 (server: 'biostation06.bgu.ac.il')
08/13/2012 17:33:19;0002;PBS_Server;Svr;PBS_Server;Server Ready, pid =
13368, loglevel=0
08/13/2012 17:33:34;0002;PBS_Server;Svr;PBS_Server;Torque Server Version =
4.1.0, loglevel = 0
08/13/2012
17:33:34;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::get_node_from_str, Node
biostation06 is reporting on node biostation06.bgu.ac.il, which pbs_server
doesn't know about
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120813/5a1b9e1d/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: torqueAdminGuide-4.0.1.pdf
Type: application/pdf
Size: 2403366 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20120813/5a1b9e1d/attachment-0001.pdf 


More information about the torqueusers mailing list