[torqueusers] Problem with pbs_iff

Ken Nielson knielson at adaptivecomputing.com
Tue Nov 2 17:01:11 MDT 2010



----- Original Message -----
From: "Abraham Zamudio" <abraham.zamudio at gmail.com>
To: "Torque Users Mailing List" <torqueusers at supercluster.org>
Sent: Tuesday, November 2, 2010 4:52:46 PM
Subject: Re: [torqueusers] Problem with pbs_iff


I'm not running pbs_iff from the command line ... i just pbs_from from the command line only for proof . 


Mi problem is when run a job with a qsub command 



grep 16.master /var/spool/torque/server_logs/20101102 
11/02/2010 17:49:54;0100;PBS_Server;Job;16.master;enqueuing into batch, state 1 hop 1 
11/02/2010 17:49:54;0008;PBS_Server;Job;16.master;Job Queued at request of mpiX at master, owner = mpiX at master, job name = ROJ1, queue = batch 
11/02/2010 17:49:55;0008;PBS_Server;Job;16.master;Job Run at request of root at master 
11/02/2010 17:49:56;000d;PBS_Server;Job;16.master;Not sending email: User does not want mail of this type. 
11/02/2010 17:49:56;0010;PBS_Server;Job;16.master;Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=3916kb resources_used.vmem=234924kb resources_used.walltime=00:00:01 
11/02/2010 17:49:56;000d;PBS_Server;Job;16.master;Post job file processing error; job 16.master on host quad2/2+quad2/1+quad2/0+quad4/2+quad4/1+quad4/0 
11/02/2010 17:49:56;0100;PBS_Server;Job;16.master;dequeuing from batch, state COMPLETE 
11/02/2010 17:50:56;000d;PBS_Server;Job;16.master;Email 'o' to mpiX at master failed: Child process '/usr/lib/sendmail -f adm mpiX at master' returned 78 (errno 10:No child processes) 


my qsub file is : 



#PBS -S /bin/bash 
#PBS -N ROJ1 
#PBS -q batch 
#PBS -l nodes=2:ppn=3 
#PBS -j oe 
#PBS -o ROJ1_$PBS_O_JOBID.out 
cd $PBS_O_WORKDIR 
/usr/local/mpiexec83/bin/mpiexec /jro_cluster/mpiX/CapacitacionMPI_ROJ/ROJ1-mpi 



grep 17.master /var/spool/torque/server_logs/20101102 
11/02/2010 18:00:02;0100;PBS_Server;Job;17.master;enqueuing into batch, state 1 hop 1 
11/02/2010 18:00:02;0008;PBS_Server;Job;17.master;Job Queued at request of mpiX at master, owner = mpiX at master, job name = ROJ1, queue = batch 
11/02/2010 18:00:03;0008;PBS_Server;Job;17.master;Job Run at request of root at master 
11/02/2010 18:00:03;000d;PBS_Server;Job;17.master;Not sending email: User does not want mail of this type. 
11/02/2010 18:00:03;0010;PBS_Server;Job;17.master;Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=3932kb resources_used.vmem=234924kb resources_used.walltime=00:00:01 
11/02/2010 18:00:03;000d;PBS_Server;Job;17.master;Post job file processing error; job 17.master on host quad2/2+quad2/1+quad2/0+quad4/2+quad4/1+quad4/0 
11/02/2010 18:00:03;0100;PBS_Server;Job;17.master;dequeuing from batch, state COMPLETE 







On Tue, Nov 2, 2010 at 5:44 PM, Ken Nielson < knielson at adaptivecomputing.com > wrote: 




On 11/02/2010 04:37 PM, Abraham Zamudio wrote: 


launch a job 


[mpiX at master CapacitacionMPI_ROJ]$ qsub ROJ1-mpi.qsub 
15.master 


the output of tracejob command : 



[mpiX at master ~]$ tracejob 15 
/var/spool/torque/server_priv/accounting/20101102: Permission denied 
/var/spool/torque/mom_logs/20101102: No such file or directory 
/var/spool/torque/sched_logs/20101102: No such file or directory 


Job: 15.master 


11/02/2010 17:42:28 S enqueuing into batch, state 1 hop 1 
11/02/2010 17:42:28 S Job Queued at request of mpiX at master, owner = mpiX at master, job name = ROJ1, queue = batch 
11/02/2010 17:42:29 S Job Run at request of root at master 
11/02/2010 17:42:29 S Not sending email: User does not want mail of this type. 
11/02/2010 17:42:29 S Not sending email: User does not want mail of this type. 
11/02/2010 17:42:29 S Exit_status=1 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb 
resources_used.walltime=00:00:00 
11/02/2010 17:42:29 S Post job file processing error 
11/02/2010 17:42:29 S dequeuing from batch, state COMPLETE 

Abraham,

I do not think pbs_iff is the problem. Reading the log files the job successfully submitted and ran. This is a post job processing error. It looks like maybe the output directory or something of that nature is not available to your application.

pbs_iff is only used at the beginning of a client operation such as qsub. 

Ken


More information about the torqueusers mailing list