[torqueusers] torque intermittent failure (read error)
hgc-01134@hkedcity.net wong
emeplease at gmail.com
Thu Aug 19 03:52:36 MDT 2010
Hi,
I am using torque-2.3.10-1 on Centos 5.5 i386 , my platform
consists of an execution node, scheduler and server. (Running inside
virtual machine, 2 cores )
To enter the interactive mode and execute a start-up script immediately,
I use "qsub -I -l nodes=1:ppn=2 -v startup=myscript" , and inside
.bashrc "exec $startup".
This works fine usually but sometimes the job will be terminated
immediately or having something like "connection reset by peer, read
error".
tracejob result:
8/19/2010 17:10:49 S enqueuing into batch, state 1 hop 1
08/19/2010 17:10:49 S Job Queued at request of cyw at cdevm2centos55x86,
owner = cyw at cdevm2centos55x86, job name = STDIN,
queue = batch
08/19/2010 17:10:49 S Job Modified at request of Scheduler at cdevm2centos55x86
08/19/2010 17:10:49 S Job Run at request of Scheduler at cdevm2centos55x86
08/19/2010 17:10:49 A queue=batch
08/19/2010 17:10:54 S Exit_status=-1 resources_used.cput=00:00:00
resources_used.mem=0kb resources_used.vmem=0kb
resources_used.walltime=00:00:05 Error_Path=/dev/pts/4
Output_Path=/dev/pts/4
08/19/2010 17:10:54 L Job Run
08/19/2010 17:10:54 M checking job post-processing routine
08/19/2010 17:10:54 S dequeuing from batch, state COMPLETE
08/19/2010 17:10:54 M obit sent to server
08/19/2010 17:10:54 A user=cywong group=cywong jobname=STDIN queue=batch
ctime=1282209049 qtime=1282209049 etime=1282209049
start=1282209054 owner=cyw at cdevm2centos55x86
exec_host=cdevm2centos55x86/1+cdevm2centos55x86/0
Resource_List.neednodes=1:ppn=2 Resource_List.nodect=1
Resource_List.nodes=1:ppn=2
Resource_List.walltime=01:00:00
08/19/2010 17:10:54 A user=cywong group=cywong jobname=STDIN queue=batch
ctime=1282209049 qtime=1282209049 etime=1282209049
start=1282209054 owner=cyw at cdevm2centos55x86
exec_host=cdevm2centos55x86/1+cdevm2centos55x86/0
Resource_List.neednodes=1:ppn=2 Resource_List.nodect=1
Resource_List.nodes=1:ppn=2
Resource_List.walltime=01:00:00 session=0
end=1282209054 Exit_status=-1
resources_used.cput=00:00:00 resources_used.mem=0kb
resources_used.vmem=0kb
resources_used.walltime=00:00:05 Error_Path=/dev/pts/4
Output_Path=/dev/pts/4
Thank you in advance.
More information about the torqueusers
mailing list