[torqueusers] attempting connect to host 3232238082 port 15002

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Tue Jun 3 00:18:21 MDT 2008


Hi,
 I am trying to setup a computer cluster obased on Intel Quad core architecture.
The machines are running Gentoo linux with 64bit kernel. It seems to me there
is a 64-bit related bug in torque 2.3.0 causing the following message:

attempting connect to host 3232238082 port 15002

Can you help explain me where does this number come from? Everywhere else in the
/var/spool/torque/server_logs/20080602 file I see valid IP addresses so I suspect
a programming error.
Thank you for your assistance,
Martin

06/02/2008 12:42:55;0080;PBS_Server;Req;dis_request_read;decoding command RunJob from root
06/02/2008 12:42:55;0100;PBS_Server;Req;;Type RunJob request received from root at nfssrv, sock=10
06/02/2008 12:42:55;0008;PBS_Server;Job;dispatch_request;dispatching request RunJob on sd=10
06/02/2008 12:42:55;0040;PBS_Server;Req;set_nodes;allocating nodes for job 32.nfssrv with node expression '2:ppn=4'
06/02/2008 12:42:55;0040;PBS_Server;Req;node_spec;entered spec=2:ppn=4
06/02/2008 12:42:55;0040;PBS_Server;Req;node_spec;job allocation debug: 2 requested, 128 svr_clnodes, 32 svr_totnodes
06/02/2008 12:42:55;0040;PBS_Server;Req;node_spec;job allocation debug(2): 2 requested, 31 svr_numnodes
06/02/2008 12:42:55;0040;PBS_Server;Req;node_spec;job allocation debug(3): returning 2 requested
06/02/2008 12:42:55;0040;PBS_Server;Req;set_nodes;allocated node node001/0 to job 32.nfssrv (nsnfree=4)
06/02/2008 12:42:55;0040;PBS_Server;Req;set_nodes;allocated node node001/1 to job 32.nfssrv (nsnfree=3)
06/02/2008 12:42:55;0040;PBS_Server;Req;set_nodes;allocated node node001/2 to job 32.nfssrv (nsnfree=2)
06/02/2008 12:42:55;0040;PBS_Server;Req;set_nodes;allocated node node001/3 to job 32.nfssrv (nsnfree=1)
06/02/2008 12:42:55;0040;PBS_Server;Req;set_nodes;allocated node node002/0 to job 32.nfssrv (nsnfree=4)
06/02/2008 12:42:55;0040;PBS_Server;Req;set_nodes;allocated node node002/1 to job 32.nfssrv (nsnfree=3)
06/02/2008 12:42:55;0040;PBS_Server;Req;set_nodes;allocated node node002/2 to job 32.nfssrv (nsnfree=2)
06/02/2008 12:42:55;0040;PBS_Server;Req;set_nodes;allocated node node002/3 to job 32.nfssrv (nsnfree=1)
06/02/2008 12:42:55;0040;PBS_Server;Req;set_nodes;job 32.nfssrv allocated 8 nodes (nodelist=node002/3+node002/2+node002/1+node002/0+node001/3+node001/2+node001/1+node001/0)
06/02/2008 12:42:55;0008;PBS_Server;Job;32.nfssrv;Job Run at request of root at nfssrv
06/02/2008 12:42:55;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting job 32.nfssrv state from QUEUED-QUEUED to RUNNING-PRERUN (4-40)
06/02/2008 12:42:55;0001;PBS_Server;Svr;PBS_Server;[continued]
06/02/2008 12:42:55;0008;PBS_Server;Job;32.nfssrv;forking in send_job
06/02/2008 12:42:55;0004;PBS_Server;Svr;svr_connect;attempting connect to host 3232238082 port 15002
06/02/2008 12:42:55;0008;PBS_Server;Job;32.nfssrv;entering post_sendmom
06/02/2008 12:42:55;0002;PBS_Server;Job;32.nfssrv;child reported success for job after 0 seconds (dest=node002), rc=0
06/02/2008 12:42:55;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting job 32.nfssrv state from RUNNING-PRERUN to RUNNING-RUNNING (4-42)
06/02/2008 12:42:55;0001;PBS_Server;Svr;PBS_Server;[continued]
06/02/2008 12:42:55;0004;PBS_Server;Svr;svr_connect;attempting connect to host 3232238082 port 15002
06/02/2008 12:42:55;0080;PBS_Server;Req;dis_request_read;decoding command Disconnect from root
06/02/2008 12:42:55;0080;PBS_Server;Req;dis_request_read;decoding command StatusJob from pbs_mom
06/02/2008 12:42:55;0100;PBS_Server;Req;;Type StatusJob request received from pbs_mom at node002, sock=12
06/02/2008 12:42:55;0008;PBS_Server;Job;dispatch_request;dispatching request StatusJob on sd=12
06/02/2008 12:42:55;0080;PBS_Server;Req;dis_request_read;decoding command JobObituary from pbs_mom
06/02/2008 12:42:55;0100;PBS_Server;Req;;Type JobObituary request received from pbs_mom at node002, sock=10
06/02/2008 12:42:55;0008;PBS_Server;Job;dispatch_request;dispatching request JobObituary on sd=10
06/02/2008 12:42:55;0009;PBS_Server;Job;32.nfssrv;obit received - updating final job usage info
06/02/2008 12:42:55;0008;PBS_Server;Job;32.nfssrv;attr resources_used modified
06/02/2008 12:42:55;000d;PBS_Server;Job;32.nfssrv;sending 'a' mail for job 32.nfssrv to mmokrejs at nfssrv (Job cannot be executed
06/02/2008 12:42:55;000d;PBS_Server;Job;32.nfssrv;[continued]See Administrator for help)
06/02/2008 12:42:55;000d;PBS_Server;Job;32.nfssrv;[continued]
06/02/2008 12:42:55;0009;PBS_Server;Job;32.nfssrv;job exit status -1 handled
06/02/2008 12:42:55;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting job 32.nfssrv state from RUNNING-RUNNING to EXITING-EXITING (5-50)
06/02/2008 12:42:55;0001;PBS_Server;Svr;PBS_Server;[continued]
06/02/2008 12:42:55;0010;PBS_Server;Job;32.nfssrv;Exit_status=-1 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:00
06/02/2008 12:42:55;0009;PBS_Server;Job;32.nfssrv;on_job_exit task assigned to job
06/02/2008 12:42:55;0009;PBS_Server;Job;32.nfssrv;req_jobobit completed
06/02/2008 12:42:55;0004;PBS_Server;Svr;svr_connect;attempting connect to host 3232238082 port 15002
06/02/2008 12:42:55;0008;PBS_Server;Job;32.nfssrv;JOB_SUBSTATE_EXITING
06/02/2008 12:42:55;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: setting job 32.nfssrv state from EXITING-EXITING to EXITING-STAGEOUT (5-51)
06/02/2008 12:42:55;0001;PBS_Server;Svr;PBS_Server;[continued]
06/02/2008 12:42:55;0008;PBS_Server;Job;32.nfssrv;JOB_SUBSTATE_STAGEOUT
06/02/2008 12:42:55;0008;PBS_Server;Job;32.nfssrv;about to copy stdout/stderr/stageout files
06/02/2008 12:42:55;0008;PBS_Server;Job;32.nfssrv;copy request failed
06/02/2008 12:42:55;0008;PBS_Server;Job;32.nfssrv;JOB_SUBSTATE_STAGEOUT
06/02/2008 12:42:55;000d;PBS_Server;Job;32.nfssrv;Post job file processing error; job 32.nfssrv on host node002/3+node002/2+node002/1+node002/0+node001/3+node001/2+node001/1+node001/0
06/02/2008 12:42:55;000d;PBS_Server;Job;32.nfssrv;request to copy stageout files failed on node 'node002/3+node002/2+node002/1+node002/0+node001/3+node001/2+node001/1+node001/0' for job 32.nfssrv




More information about the torqueusers mailing list