[torqueusers] Post job file processing error with torque 2.3.1 in gentoo running a namd script

gonzalo at cbuc.cl gonzalo at cbuc.cl
Fri Aug 1 23:01:47 MDT 2008


Hello
i' join the team of torque users, i installed torque 2.3.1 version on a
gentoo cluster
and works pretty well, i can submit normal jobs withouts problems using
qsub some_jobscript but i'm having problems with torque when trying to run
the following namd script

#!/bin/sh                                                                  
                       
#PBS -l nodes=4                                                            
                       
#PBS -l walltime=1:00:00                                                   
                       
#PBS -j oe                                                                 
                       
#PBS                                                                       
                       
#                                                                          
                       
##################################################################         
                       
#CONFIG: change this                                                       
                       
jobcfg=DOT_box_100.conf                                                    
                       
jobout=DOT_box_100.out                                                     
                       
#use submit directory as working directory                                 
                       
#jobdir=${PBS_O_WORKDIR}                                                   
                       
# alt. set explicite dir                                                   
                       
jobdir=/home/test/nfs/NAMD_test                                            
                       
#END CONFIG                                                                
                       
##################################################################         
                       
#                                                                          
                       
# don't touch unless you know what you are doing                           
                       
cd ${jobdir}                                                               
                       
numprocs=0                                                                 
                       
tmpfile=/tmp/${PBS_JOBCOOKIE}.dat                                          
                       

# build the nodelist. note the different styles of quotes
rm -f ${tmpfile}
echo 'group main' > ${tmpfile}
for s in `sort < ${PBS_NODEFILE} | uniq `
do echo "host $s" >> ${tmpfile} ; numprocs=`expr ${numprocs} + 1`; done

# LOG
cat ${PBS_NODEFILE} > logfile

# use ssh instead of rsh
CONV_RSH=ssh
export CONV_RSH

# now we're ready to go:
/usr/bin/charmrun /usr/bin/namd2 +p${numprocs} ++nodelist ${tmpfile}
${jobcfg} > ${jobout} 2>&1

# preserve exit status and clean up
status=$?
rm -f ${tmpfile}
exit ${status}
# done



i'm getting this error in the server log:

08/01/2008 22:47:07;000d;PBS_Server;Job;6.dna1.cbuc.cl;Post job file
processing error; job 6.dna1.cbuc.cl on host dna7/1+dna7/0+dna6/1+dna6/0   



By other hand:
Charmrun with Namd works well so i figured out that maybe is my torque
configuration.
I must mention that the directory where is runing my script is shared whith
the other pc's of the cluster by nfs
 

can you say me what i'm doing wrong please?.






This is the server log:
08/01/2008 22:46:57;0100;PBS_Server;Job;6.dna1.cbuc.cl;enqueuing into
batch, state 1 hop 1      
08/01/2008 22:46:57;0008;PBS_Server;Job;6.dna1.cbuc.cl;Job Queued at
request of test at dna1.cbuc.cl, owner = test at dna1.cbuc.cl, job name =
namd.sh, queue = batch                                 
08/01/2008 22:46:57;0040;PBS_Server;Svr;dna1.cbuc.cl;Scheduler sent command
new                 
08/01/2008 22:46:58;0008;PBS_Server;Job;6.dna1.cbuc.cl;Job Modified at
request of root at dna1.cbuc.cl                                               
                                             
08/01/2008 22:46:58;0008;PBS_Server;Job;6.dna1.cbuc.cl;Job Run at request
of root at dna1.cbuc.cl  
08/01/2008 22:46:58;0008;PBS_Server;Job;6.dna1.cbuc.cl;Job Modified at
request of root at dna1.cbuc.cl                                               
                                             
08/01/2008 22:47:03;0010;PBS_Server;Job;6.dna1.cbuc.cl;Exit_status=1
resources_used.cput=00:00:00 resources_used.mem=392kb
resources_used.vmem=1892kb resources_used.walltime=00:00:00
session_id=16830                                                           
                             
08/01/2008 22:47:07;000d;PBS_Server;Job;6.dna1.cbuc.cl;Post job file
processing error; job 6.dna1.cbuc.cl on host dna7/1+dna7/0+dna6/1+dna6/0   
                                               
08/01/2008 22:47:07;0100;PBS_Server;Job;6.dna1.cbuc.cl;dequeuing from
batch, state COMPLETE



More information about the torqueusers mailing list