[torqueusers] Problem with '-l' option
Ibad Kureshi U0850037
U0850037 at hud.ac.uk
Fri Sep 3 03:27:45 MDT 2010
Hello all, hope this finds you well.
I am having a problem with the -l switch in the job files.
We are running 2 clusters deployed using the OSCAR 5.1b2 middleware on CENTOS 5.4.
1 Of our clusters has 64 2.33 GHz cores in 16 nodes and 64 2.50 GHz cores in nodes 17-32. In the server_priv/nodes file we have added the property of C23 and C25 for users to distinguish between the core speeds.
So #PBS -l nodes=2:ppn=4:C23 // will always issue a 2.33 ghz node
and #PBS -l nodes=2:ppn=4:C25 // will always issue a 2.50 ghz node
The problem is that if a user tried to combine them TORQUE doesn't behave as it is supposed to
#PBS -l nodes=1:ppn=4:C25+2:ppn=4:C23
Doesn't allocate 4 Cores on 1 2.5Ghz machine and 4 cores each on 2 2.3ghz machines. Infact the job just goes into the queued state and sits forever. This holds true for all combinations or node value and ppn values
The only way it does work is:
#PBS -l nodes=C25+##:C23 //where the hashes indicate quantity of nodes (note: it wont let me put a quantity before the first property)
and this only issues 1 core on a single C25 and single cores on C23 machines.
So where is the config (or my implementation of the command) going wrong.
Any help will be appreciated.
This transmission is confidential and may be legally privileged. If you receive it in error, please notify us immediately by e-mail and remove it from your system. If the content of this e-mail does not relate to the business of the University of Huddersfield, then we do not endorse it and will accept no liability.
More information about the torqueusers