[torqueusers] PBS_NODEFILE incomplete

David Beer dbeer at adaptivecomputing.com
Mon Dec 20 09:45:25 MST 2010


Scott,

Can you send in the output of qstat -f for the job, as well as the contents of $PBS_NODEFILE?

David

----- Original Message -----
> PBS_NODEFILE incomplete
> Dear all,
> 
> I would like to re-open this thread.
> 
> http://www.supercluster.org/pipermail/torqueusers/2010-October/011518.html
> 
> We have exactly the same problem, and I’ve also fiddled for many days
> trying all sorts of configurations to sort the problem out. It’s not
> surprising we have the same problem, since we are running the same
> software (part of our national grid infrastructure, running Glite 3.2
> on SL5.4). The torque and maui packages are installed automatically by
> the grid installation. I am sure that installing later versions would
> fix the problem, but I’m afraid that would break some of the grid
> software which is highly fragile.
> 
> The basic symptom is that PBS_NODEFILE is wrong. If in my job file I
> ask for a certain number of processors so that I can run an MPI job
> across our cluster, only one node is placed in PBS_NODEFILE.
> 
> If I do a checkjob on the job being run, it looks like the right
> number of nodes is being allocated and it shows the names of the nodes
> which are available. However, the job only runs on one of the nodes
> and all my MPI jobs run on that node (far in excess of the actual
> number of
> 
> We are running torque 2.3.6-2cri.el5 and maui
> 3.2.6p21-snap.1234905291.5.el5.
> 
> In my maui.cfg I have
> 
> ENABLEMULTIREQJOBS TRUE
> ENABLEMULTINODEJOBS TRUE
> 
> 
> I have experimented with a wide range of queue configurations, none of
> which worked.
> 
> 
> What should I have in my maui.cfg?
> 
> What are the appropriate torque queue parameters ?
> 
> I want to be able to specify an MPI job runs on p nodes with no more
> than q processes per node.
> 
> If anyone could send me configurations, I’d be very grateful,
> 
> Many thanks
> 
> Scott
> 
> 
> 
> 
> 
> 
> 
> This communication is intended for the addressee only. It is
> confidential. If you have received this communication in error, please
> notify us immediately and destroy the original message. You may not
> copy or disseminate this communication without the permission of the
> University. Only authorized signatories are competent to enter into
> agreements on behalf of the University and recipients are thus advised
> that the content of this message may not be legally binding on the
> University and may contain the personal views and opinions of the
> author, which are not necessarily the views and opinions of The
> University of the Witwatersrand, Johannesburg. All agreements between
> the University and outsiders are subject to South African Law unless
> the University agrees in writing to the contrary.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-- 
David Beer 
Direct Line: 801-717-3386 | Fax: 801-717-3738
     Adaptive Computing
     1656 S. East Bay Blvd. Suite #300
     Provo, UT 84606



More information about the torqueusers mailing list