[torqueusers] PBS_NODEFILE incomplete
David Beer
dbeer at adaptivecomputing.com
Mon Dec 20 09:45:25 MST 2010
Scott,
Can you send in the output of qstat -f for the job, as well as the contents of $PBS_NODEFILE?
David
----- Original Message -----
> PBS_NODEFILE incomplete
> Dear all,
>
> I would like to re-open this thread.
>
> http://www.supercluster.org/pipermail/torqueusers/2010-October/011518.html
>
> We have exactly the same problem, and I’ve also fiddled for many days
> trying all sorts of configurations to sort the problem out. It’s not
> surprising we have the same problem, since we are running the same
> software (part of our national grid infrastructure, running Glite 3.2
> on SL5.4). The torque and maui packages are installed automatically by
> the grid installation. I am sure that installing later versions would
> fix the problem, but I’m afraid that would break some of the grid
> software which is highly fragile.
>
> The basic symptom is that PBS_NODEFILE is wrong. If in my job file I
> ask for a certain number of processors so that I can run an MPI job
> across our cluster, only one node is placed in PBS_NODEFILE.
>
> If I do a checkjob on the job being run, it looks like the right
> number of nodes is being allocated and it shows the names of the nodes
> which are available. However, the job only runs on one of the nodes
> and all my MPI jobs run on that node (far in excess of the actual
> number of
>
> We are running torque 2.3.6-2cri.el5 and maui
> 3.2.6p21-snap.1234905291.5.el5.
>
> In my maui.cfg I have
>
> ENABLEMULTIREQJOBS TRUE
> ENABLEMULTINODEJOBS TRUE
>
>
> I have experimented with a wide range of queue configurations, none of
> which worked.
>
>
> What should I have in my maui.cfg?
>
> What are the appropriate torque queue parameters ?
>
> I want to be able to specify an MPI job runs on p nodes with no more
> than q processes per node.
>
> If anyone could send me configurations, I’d be very grateful,
>
> Many thanks
>
> Scott
>
>
>
>
>
>
>
> This communication is intended for the addressee only. It is
> confidential. If you have received this communication in error, please
> notify us immediately and destroy the original message. You may not
> copy or disseminate this communication without the permission of the
> University. Only authorized signatories are competent to enter into
> agreements on behalf of the University and recipients are thus advised
> that the content of this message may not be legally binding on the
> University and may contain the personal views and opinions of the
> author, which are not necessarily the views and opinions of The
> University of the Witwatersrand, Johannesburg. All agreements between
> the University and outsiders are subject to South African Law unless
> the University agrees in writing to the contrary.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
--
David Beer
Direct Line: 801-717-3386 | Fax: 801-717-3738
Adaptive Computing
1656 S. East Bay Blvd. Suite #300
Provo, UT 84606
More information about the torqueusers
mailing list