[torqueusers] unable to submit script with resource requirement

Akshat Srivastava akshat11nov at gmail.com
Wed Jan 13 07:11:17 MST 2010


I've a very simple setup, a Beowulf Cluster with 3 nodes; server client1 and
client2.
mpich user is mounted on client 1 and 2 using NFS and MPICH2 is installed in
mpich's home dir
Torque-2.4.3 is installed on this cluster with following configuration
for server
./configure --prefix=/opt/pbs --enable-mom --enable-server --enable-client
--with-default-server=server
and for client
./configure --prefix=/opt/pbs --enable-mom --enable-client
--with-default-server=server
after installing I've installed packeges
server mom and client --> server
mom and client --> client1 and 2
since my server is also a compute node so I've installed mom package on
server.
and my default queue is
qmgr
Qmgr: create queue batch
Qmgr: set server operators = root at server
Qmgr: set queue batch queue_type = Execution
Qmgr: set queue batch started = True
Qmgr: set queue batch enabled = True
Qmgr: set server default_queue = batch
Qmgr: set server scheduling = True

now the problem is job's with resource requirements can't run, if I type a
script

#!/bin/sh
#PBS -N testJob
#PBS -l nodes=2:ppn=2
#PBS -l walltime=00:02:00
sleep 100
/home/mpich/mpich2-install/bin/mpirun -n 10 mpich2-1.0.8/examples/cpi
hostname

would not run but but if I ommit the line #PBS -l nodes=2:ppn=2 it would
run. Why is it that I can't submit resource requirements?
and following would run perfectly

#!/bin/sh
#PBS -N testJob
#PBS -l walltime=00:02:00
sleep 100
/home/mpich/mpich2-install/bin/mpirun -n 10 mpich2-1.0.8/examples/cpi
hostname

[mpich at server ~]$ qsub jobScript.sh (I submitted a script with resource
requirement)
14.server
but there was no output in home dir and
following are the log's generated
pbs_mom
01/13/2010 15:07:49;0008;   pbs_mom;Job;14.server;JOIN JOB as node 1
01/13/2010 15:07:59;0001;
pbs_mom;Svr;pbs_mom;LOG_DEBUG::delete_blcr_checkpoint_files, No checkpoint
directory specified for 14.server
pbs_server
01/13/2010 15:07:49;0100;PBS_Server;Job;14.server;enqueuing into batch,
state 1 hop 1
01/13/2010 15:07:49;0008;PBS_Server;Job;14.server;Job Queued at request of
mpich at server, owner = mpich at server, job name = testJob, queue = batch
01/13/2010 15:07:49;0040;PBS_Server;Svr;server;Scheduler was sent the
command new
01/13/2010 15:07:49;0008;PBS_Server;Job;14.server;Job Modified at request of
Scheduler at server
01/13/2010 15:07:49;0008;PBS_Server;Job;14.server;Job Run at request of
Scheduler at server
01/13/2010 15:07:49;0040;PBS_Server;Svr;server;Scheduler was sent the
command recyc
01/13/2010 15:08:00;0010;PBS_Server;Job;14.server;Exit_status=0
resources_used.cput=00:00:00 resources_used.mem=380kb
resources_used.vmem=2428kb resources_us
ed.walltime=00:00:12
01/13/2010 15:08:09;000d;PBS_Server;Job;14.server;Post job file processing
error; job 14.server on host client1/1+client1/0+server/1+server/0
01/13/2010 15:08:09;0100;PBS_Server;Job;14.server;dequeuing from batch,
state COMPLETE
01/13/2010 15:08:09;0040;PBS_Server;Svr;server;Scheduler was sent the
command term
pbs_sched
01/13/2010 15:07:49;0040; pbs_sched;Job;14.server;Job Run
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100113/1d2adb90/attachment-0001.html 


More information about the torqueusers mailing list