[torqueusers] Submitting jobs from a 32-bit OS to a 64-bit Torque server

Wayne Mallett wayne.mallett at jcu.edu.au
Thu Jan 29 17:06:57 MST 2009


G'day all,

Ever since upgrading Torque to 2.3.6 I find that servers running 32-bit 
O/Ses will no longer submit jobs (successfully).  I get the message:

qsub: read error: connection reset by peer

The mom logs show:
01/30/2009 10:01:07;0008;   pbs_mom;Job;95037.head3.cluster;ready to 
commit job
01/30/2009 10:01:07;0008;   pbs_mom;Job;95037.head3.cluster;ready to 
commit job completed
01/30/2009 10:01:07;0008;   pbs_mom;Job;95037.head3.cluster;committing job
01/30/2009 10:01:07;0008;   pbs_mom;Job;95037.head3.cluster;starting job 
execution
01/30/2009 10:01:07;0001;   pbs_mom;Job;job_nodes;job: 
95037.head3.cluster numnodes=1 numvnod=1
01/30/2009 10:01:07;0008;   pbs_mom;Job;95037.head3.cluster;evaluating 
limits for job
01/30/2009 10:01:07;0001;   pbs_mom;Job;95037.head3.cluster;about to 
fork child which will become job
01/30/2009 10:01:07;0001;   pbs_mom;Job;TMomFinalizeJob2;job: 
95037.head3.cluster numnodes=1 numvnod=1
01/30/2009 10:01:07;0001;   pbs_mom;Job;95037.head3.cluster;phase 2 of 
job launch successfully completed
01/30/2009 10:01:12;0001;   pbs_mom;Job;95037.head3.cluster;job not 
ready after 5 second timeout, MOM will recheck
01/30/2009 10:01:12;0008;   pbs_mom;Job;95037.head3.cluster;job 
execution started
01/30/2009 10:01:12;0002; 
pbs_mom;n/a;mom_server_update_stat;mom_server_update_stat: sending to 
server "jobs=95037.head3.cluster"
01/30/2009 10:01:12;0008;   pbs_mom;Job;95037.head3.cluster;checking job 
start in TMOMScanForStarting - examining pipe from child
01/30/2009 10:01:12;0001;   pbs_mom;Job;95037.head3.cluster;task/session 
info loaded
01/30/2009 10:01:12;0008;   pbs_mom;Req;send_sisters;sending command 
ABORT_JOB for job 95037.head3.cluster (10)
01/30/2009 10:01:12;0008;   pbs_mom;Job;kill_job;scan_for_exiting: 
sending signal 9, "KILL" to job 95037.head3.cluster, reason: local task 
termination detected
01/30/2009 10:01:12;0008;   pbs_mom;Job;95037.head3.cluster;kill_job 
done (killed 0 processes)
01/30/2009 10:01:12;0080;   pbs_mom;Job;95037.head3.cluster;sending 
preobit jobstat
01/30/2009 10:01:12;0080;   pbs_mom;n/a;cput_sum;proc_array loop start - 
jobid = 95037.head3.cluster
01/30/2009 10:01:12;0080;   pbs_mom;n/a;mem_sum;proc_array loop start - 
jobid = 95037.head3.cluster
01/30/2009 10:01:12;0080;   pbs_mom;n/a;resi_sum;proc_array loop start - 
jobid = 95037.head3.cluster
01/30/2009 10:01:12;0080;   pbs_mom;Job;95037.head3.cluster;checking job 
w/subtask pid=0 (child pid=9992)
01/30/2009 10:01:12;0080;   pbs_mom;Job;95037.head3.cluster;performing 
job clean-up
01/30/2009 10:01:12;0080;   pbs_mom;Job;95037.head3.cluster;epilog 
subtask created with pid 9993 - substate set to JOB_SUBSTATE_OBIT - 
registered post_epilogue
01/30/2009 10:01:12;0080;   pbs_mom;n/a;cput_sum;proc_array loop start - 
jobid = 95037.head3.cluster
01/30/2009 10:01:12;0080;   pbs_mom;n/a;mem_sum;proc_array loop start - 
jobid = 95037.head3.cluster
01/30/2009 10:01:12;0080;   pbs_mom;n/a;resi_sum;proc_array loop start - 
jobid = 95037.head3.cluster
01/30/2009 10:01:12;0080;   pbs_mom;Job;95037.head3.cluster;checking job 
w/subtask pid=9993 (child pid=9993)
01/30/2009 10:01:12;0008;   pbs_mom;Job;95037.head3.cluster;checking job 
post-processing routine
01/30/2009 10:01:12;0080;   pbs_mom;Req;post_epilogue;preparing obit 
message for job 95037.head3.cluster
01/30/2009 10:01:12;0080;   pbs_mom;Job;95037.head3.cluster;encoding 
"send flagged" attr: Error_Path
01/30/2009 10:01:12;0080;   pbs_mom;Job;95037.head3.cluster;encoding 
"send flagged" attr: Output_Path
01/30/2009 10:01:12;0080;   pbs_mom;Job;95037.head3.cluster;obit sent to 
server
01/30/2009 10:01:12;0001;   pbs_mom;Job;95037.head3.cluster;setting job 
substate to EXITED


Thanks in advance,
Wayne
-- 
Dr. Wayne Mallett
High Performance & Research Computing Support

Phone:	0747815084
Email:	Wayne.Mallett at jcu.edu.au
Smail:	James Cook University
	Townsville  Qld  4811
	Australia


More information about the torqueusers mailing list