[torqueusers] Problem with pbs_iff
Ken Nielson
knielson at adaptivecomputing.com
Tue Nov 2 17:01:11 MDT 2010
----- Original Message -----
From: "Abraham Zamudio" <abraham.zamudio at gmail.com>
To: "Torque Users Mailing List" <torqueusers at supercluster.org>
Sent: Tuesday, November 2, 2010 4:52:46 PM
Subject: Re: [torqueusers] Problem with pbs_iff
I'm not running pbs_iff from the command line ... i just pbs_from from the command line only for proof .
Mi problem is when run a job with a qsub command
grep 16.master /var/spool/torque/server_logs/20101102
11/02/2010 17:49:54;0100;PBS_Server;Job;16.master;enqueuing into batch, state 1 hop 1
11/02/2010 17:49:54;0008;PBS_Server;Job;16.master;Job Queued at request of mpiX at master, owner = mpiX at master, job name = ROJ1, queue = batch
11/02/2010 17:49:55;0008;PBS_Server;Job;16.master;Job Run at request of root at master
11/02/2010 17:49:56;000d;PBS_Server;Job;16.master;Not sending email: User does not want mail of this type.
11/02/2010 17:49:56;0010;PBS_Server;Job;16.master;Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=3916kb resources_used.vmem=234924kb resources_used.walltime=00:00:01
11/02/2010 17:49:56;000d;PBS_Server;Job;16.master;Post job file processing error; job 16.master on host quad2/2+quad2/1+quad2/0+quad4/2+quad4/1+quad4/0
11/02/2010 17:49:56;0100;PBS_Server;Job;16.master;dequeuing from batch, state COMPLETE
11/02/2010 17:50:56;000d;PBS_Server;Job;16.master;Email 'o' to mpiX at master failed: Child process '/usr/lib/sendmail -f adm mpiX at master' returned 78 (errno 10:No child processes)
my qsub file is :
#PBS -S /bin/bash
#PBS -N ROJ1
#PBS -q batch
#PBS -l nodes=2:ppn=3
#PBS -j oe
#PBS -o ROJ1_$PBS_O_JOBID.out
cd $PBS_O_WORKDIR
/usr/local/mpiexec83/bin/mpiexec /jro_cluster/mpiX/CapacitacionMPI_ROJ/ROJ1-mpi
grep 17.master /var/spool/torque/server_logs/20101102
11/02/2010 18:00:02;0100;PBS_Server;Job;17.master;enqueuing into batch, state 1 hop 1
11/02/2010 18:00:02;0008;PBS_Server;Job;17.master;Job Queued at request of mpiX at master, owner = mpiX at master, job name = ROJ1, queue = batch
11/02/2010 18:00:03;0008;PBS_Server;Job;17.master;Job Run at request of root at master
11/02/2010 18:00:03;000d;PBS_Server;Job;17.master;Not sending email: User does not want mail of this type.
11/02/2010 18:00:03;0010;PBS_Server;Job;17.master;Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=3932kb resources_used.vmem=234924kb resources_used.walltime=00:00:01
11/02/2010 18:00:03;000d;PBS_Server;Job;17.master;Post job file processing error; job 17.master on host quad2/2+quad2/1+quad2/0+quad4/2+quad4/1+quad4/0
11/02/2010 18:00:03;0100;PBS_Server;Job;17.master;dequeuing from batch, state COMPLETE
On Tue, Nov 2, 2010 at 5:44 PM, Ken Nielson < knielson at adaptivecomputing.com > wrote:
On 11/02/2010 04:37 PM, Abraham Zamudio wrote:
launch a job
[mpiX at master CapacitacionMPI_ROJ]$ qsub ROJ1-mpi.qsub
15.master
the output of tracejob command :
[mpiX at master ~]$ tracejob 15
/var/spool/torque/server_priv/accounting/20101102: Permission denied
/var/spool/torque/mom_logs/20101102: No such file or directory
/var/spool/torque/sched_logs/20101102: No such file or directory
Job: 15.master
11/02/2010 17:42:28 S enqueuing into batch, state 1 hop 1
11/02/2010 17:42:28 S Job Queued at request of mpiX at master, owner = mpiX at master, job name = ROJ1, queue = batch
11/02/2010 17:42:29 S Job Run at request of root at master
11/02/2010 17:42:29 S Not sending email: User does not want mail of this type.
11/02/2010 17:42:29 S Not sending email: User does not want mail of this type.
11/02/2010 17:42:29 S Exit_status=1 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb
resources_used.walltime=00:00:00
11/02/2010 17:42:29 S Post job file processing error
11/02/2010 17:42:29 S dequeuing from batch, state COMPLETE
Abraham,
I do not think pbs_iff is the problem. Reading the log files the job successfully submitted and ran. This is a post job processing error. It looks like maybe the output directory or something of that nature is not available to your application.
pbs_iff is only used at the beginning of a client operation such as qsub.
Ken
More information about the torqueusers
mailing list