[torqueusers] strange behaviour of ppn

Govind govind.rhul at googlemail.com
Fri Nov 26 09:21:56 MST 2010


Hi All,

I  have figured out that  maui allocating correct number of processors but
torque does not.
I am using torque 2.3.6 and maui 3.2.6p21

This is my basic test script
====================
#PBS -q long
#PBS -l nodes=4:ppn=4
# Check the number of nodes and their names
echo "Number of nodes:"
echo `wc -l < $PBS_NODEFILE`
echo "Node list:"
cat $PBS_NODEFILE
===================

The job output and tracejob show a single node
========================
Number of nodes:
4
Node list:
node53.beowulf.cluster
node53.beowulf.cluster
node53.beowulf.cluster
node53.beowulf.cluster
=================

Maui (showq, checkjob) shows total 16 proc alloacted
==============
 checkjob -v 192520

checking job 192520 (RM job '192520.pbs1.pp.rhul.ac.uk')

State: Running
Creds:  user:gsongara  group:hep2  class:long  qos:DEFAULT
WallTime: 00:00:22 of 2:12:00:00
SubmitTime: Fri Nov 26 16:14:34
  (Time Queued  Total: 00:00:01  Eligible: 00:00:01)

StartTime: Fri Nov 26 16:14:35
Total Tasks: 16

Req[0]  TaskCount: 16  Partition: DEFAULT
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
Exec:  ''  ExecSize: 0  ImageSize: 0
Dedicated Resources Per Task: PROCS: 1
Utilized Resources Per Task:  [NONE]
Avg Util Resources Per Task:  PROCS: 0.01
Max Util Resources Per Task:  [NONE]
Average Utilized Procs: 16.00
NodeAccess: SHARED
TasksPerNode: 4  NodeCount: 4
Allocated Nodes:
[node56.beowulf.clust:4][node55.beowulf.clust:4][node54.beowulf.clust:4][node53.beowulf.clust:4]

Task Distribution:
node56.beowulf.clust,node56.beowulf.clust,node56.beowulf.clust,node56.beowulf.clust,node55.beowulf.clust,node55.beowulf.clust,node55.beowulf.clust,node55.beowulf.clust,node54.beowulf.clust,node54.beowulf.clust,node54.beowulf.clust,...

========================================

I having basic queue config
set queue long resources_max.nodes = 4
set queue long resources_default.nodes = 1

Could you please advise, i think torque is bottleneck here.



Regards
Govind






On Wed, Nov 24, 2010 at 7:18 AM, Christopher Samuel
<samuel at unimelb.edu.au>wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 24/11/10 18:15, Govind wrote:
>
> > We are using both openmpi (complied with torque support) and mpich.
> > In both the cases it run jobs on single slot only. Please advise if i
> > required need any specific configuration on torque.
>
> For Open-MPI you need to specify --with-tm=$PATH_TO_TORQUE
> for it to recognise and use the Torque build.
>
> Other than that it works fine, we've been using it for
> ages without any issues.
>
> cheers,
> Chris
> - --
>  Christopher Samuel - Senior Systems Administrator
>  VLSCI - Victorian Life Sciences Computational Initiative
>  Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>         http://www.vlsci.unimelb.edu.au/
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAkzsvFoACgkQO2KABBYQAh/rhwCglzJBmQVSeXbbVufNJr4rElFD
> laoAmwTi1yUTymLu8R4FNI7gp92Pwiv6
> =M2Nr
> -----END PGP SIGNATURE-----
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20101126/8b709480/attachment.html 


More information about the torqueusers mailing list