[torqueusers] There's any way to allocate nodes by CPUs ?
James J Coyle
jjc at iastate.edu
Tue Feb 13 15:54:21 MST 2007
Leandro,
Take a look at
http://andrew.ait.iastate.edu/HPC/lightning/lightning_script_writer.html
I have this for users who don't want to have to learn a new batch scheduler
syntax for each machine.
I'm attaching the CGI script which is just a perl script which you can get
with the View->Page Source, in case you need to break it out of the web and
use standalone. I'll apologize in advance, it has been modified over time for
use
on 4 different machines from an SGI Origin 2000 to a new Opteron Cluster
with 4 processors/node.
- Jim Coyle
--
James Coyle, PhD
SGI Origin, Alpha, Xeon and Opteron Cluster Manager
High Performance Computing Group
235 Durham Center
Iowa State Univ. phone: (515)-294-2099
Ames, Iowa 50011 web: http://jjc.public.iastate.edu
> --===============0119732312==
> Content-Type: multipart/alternative;
> boundary="----=_Part_73233_23015475.1171392852770"
>
> ------=_Part_73233_23015475.1171392852770
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> Hi,
>
> Sorry by the cross post, but I'm looking for a solution for a problem I have
> with Torque with and without Maui.
>
> What I need is choose nodes by CPUs in a simpler way than "-l nodes=X:ppn=Y"
> way. What I'm looking for is a "-l cpus=X" and let the resource
> manager/scheduler choose the nodes, packing the tasks in the nodes with more
> CPUs.
>
> I'm testing a heterogeneous cluster, where I have nodes with 2 CPUs single
> core and nodes with 2 CPUs dual core (4 CPUs). This test cluster have 3
> nodes with 4 CPUs and 2 nodes with 2 CPUs.
>
> All the jobs we run here are called automatically by an application, which
> calls some scripts to make the job on the cluster. If I need 16 CPUs I don't
> see any easy way to make a generic script, using some standard parameters,
> to create a "-l nodes=3:ppn=4+2:ppn=2" line :-/
>
> The easy way is to have something like "-l cpus=16" to the previous example,
> but it is possible?
>
> Today we are using dual core CPUs but what is gonna happen when we use quad
> core CPUs? I see an ugly picture to my job allocation task...
>
> Thanks in advance for any help,
>
> Regards,
>
> --
> Leandro Tavares Carneiro
> Analista de Suporte Linux/Unix
>
> ------=_Part_73233_23015475.1171392852770
> Content-Type: text/html; charset=ISO-8859-1
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> Hi,<br><br>Sorry by the cross post, but I'm looking for a solution for a problem I have with Torque with and without Maui.<br><br>What I need is choose nodes by CPUs in a simpler way than "-l nodes=X:ppn=Y" way. What I'm looking for is a "-l cpus=X" and let the resource manager/scheduler choose the nodes, packing the tasks in the nodes with more CPUs.
> <br><br>I'm testing a heterogeneous cluster, where I have nodes with 2 CPUs single core and nodes with 2 CPUs dual core (4 CPUs). This test cluster have 3 nodes with 4 CPUs and 2 nodes with 2 CPUs.<br><br>All the jobs we run here are called automatically by an application, which calls some scripts to make the job on the cluster. If I need 16 CPUs I don't see any easy way to make a generic script, using some standard parameters, to create a "-l nodes=3:ppn=4+2:ppn=2" line :-/
> <br><br>The easy way is to have something like "-l cpus=16" to the previous example, but it is possible?<br><br>Today we are using dual core CPUs but what is gonna happen when we use quad core CPUs? I see an ugly picture to my job allocation task...
> <br><br>Thanks in advance for any help,<br><br>Regards,<br clear="all"><br>-- <br>Leandro Tavares Carneiro<br>Analista de Suporte Linux/Unix
>
> ------=_Part_73233_23015475.1171392852770--
>
> --===============0119732312==
> Content-Type: text/plain; charset="us-ascii"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
> --===============0119732312==--
>
-------------- next part --------------
#!/usr/local/bin/perl
#
# Written by Jim Coyle,(jjc at istate.edu) Iowa State Univ, Ames Iowa.
# (C) All rights reserved.
# Disclaimer: The author and Iowa State Univ. assume no
# responsibility regarding the use and/or misuse of
# this software.
#
# This perl file is a CGI script written to go with the WWW form
# nqs_writer.html. It writes the NQS script based upon user input.
# The sample programs come from the directory ../htdocs/SAMPLES/.
# All these files need to be owned by the username "nobody" to
# work properly.
#
# print mime header
print "Content-type: text/html\n\n";
# name for NQS output file:
$out="BATCH_OUTPUT";
$err="BATCH_ERRORS";
# Note for special arangements
$arranged="\n# Time : as scheduled with machine room operator at 4-2256 option 2";
# Standard divider used in instructions
$divider="#-------------------------------------------------------------#\n";
#Queues may be open or restricted. Script will contain a URL for application
# form if queue is restricted.
%QACCESS=("FARM-S", "open",
"FARM-T", "open",
"FARM-M", "open",
"FARM-L", "restricted",
"FARM-H", "restricted",
"FARM-P", "restricted",
"FARM-PS","restricted",
"FARM-LS","restricted",
);
%QCOMMENT=("FARM-S", "# small queue: 1 hr CPU time 64MB memory ",
"FARM-T", "# short queue: 10 min CPU time 200MB memory ",
"FARM-M", "# medium queue: 2 hrs CPU time 200MB memory ",
"FARM-L", "# large queue: 24 hrs CPU time 200MB memory ",
"FARM-LS","# long-small queue: 24 hrs CPU time 100MB memory ",
"FARM-H", "# huge queue: 450MB memory $arranged",
"FARM-P", "# 4 CPU parallel queue: 24 hrs CPU time 1000MB memory ",
"FARM-PS","# high priority 4 CPU parallel queue; 15 min CPU time "
);
%SAMPLES= ("kf90" , "Parallel/OpenMP_F",
"kcc" , "Parallel/FARM-P_4_KAP_C",
"MPI_C", "Parallel/FARM-P_4_MPI_C",
"MPI_F", "Parallel/FARM-P_4_MPI_F",
"PVM_C", "Parallel/FARM-P_4_PVM_C",
"PVM_F", "Parallel/FARM-P_4_PVM_F",
"maple", "Serial/FARM_MAPLE",
"matlab","Serial/FARM_MATLAB"
);
%NOTIFY=("start", "# notify when job starts \n#\$ -mb \n",
"end", "# notify when job completes\n#\$ -me \n");
$Instructions =
'# Instructions for new farm users:' . "\n"
. '# To use this script:' . "\n"
. '# 1) Save this script as a file named myscript in an AFS directory ' . "\n"
. '# (eg. your home directory or your private AFS locker)' . "\n"
. '# 2) Issue ' . "\n"
. '# add farm ( if you haven\'t already )' . "\n"
. '# relogin -l 2d ( if you will be running for hours ) ' . "\n"
. '# qsub myscript ' . "\n"
. '# Use qstat to see job status; ' . "\n"
. '# or to delete one of your jobs' . "\n\n"
. "# Start job in current directory.\n" . '#$ -cwd ' . "\n\n"
. "# Output goes to file $out in current directory.\n#\$ -eo $out \n\n";
$Instructions =
'# Instructions for new Opteron/HTX Cluster users:' . "\n"
. '# To use this script:' . "\n"
. '# 1) Save this script as a file named myscript on hpc4' . "\n"
. '# 2) On hpc4, Issue ' . "\n"
. '# qsub myscript to submit the job ' . "\n"
. '# Use qstat -a to see job status, ' . "\n"
. '# Use qdel jobname to delete one of your jobs' . "\n"
. '# jobnames are of the form 1234.hpc4 ' . "\n\n"
# . "# This script has cd $PBS_O_WORKDIR as the first command. \n"
# . "# qsub command was executed. This is what most users want. Change\n"
# . "# that command if you want something else.\n"
# . "###########################################\n"
# . "# MPI Users: \n"
# . '# For now, use mpirun -machinefile $PBS_NODEFILE' ."\n"
# . "# instead of just mpirun \n"
# . "# This will be fixed in the near future. \n"
. "###########################################\n"
. "# Output goes to file $out.\n"
. "# Error output goes to file $err.\n"
. "# If you want the output to go to another file, change $out \n"
. "# or $err in the following lines to the full path of that file. \n\n"
. "#PBS -o $out \n"
. "#PBS -e $err \n\n" ;
##NQE retired . "#QSUB -o $out \n"
##NQE retired . "#QSUB -e $err\n\n" ;
$Comments_for_restricted_queues=
$divider
. "# You must have a computing grant to use the $queue queue.\n"
. "# To obtain a computing grant, see URL \n"
. '# <a HREF="http://www.public.iastate.edu/~farm/application.html"> http://www.public.iastate.edu/~farm/application.html </a>' . "\n"
. '# http://www.public.iastate.edu/~farm/application.html ' . "\n"
. "$divider \n";
$nodes=0;$cpus=0;
@NOTIFY_LIST=();
@args = &get_args();
foreach $a ( @args ) {
($t,$v) = split(/=/,$a);
if ( $t eq "user_name" ) { $user=$v;}
if ( $t eq "notify" ) { push(@NOTIFY_LIST, $NOTIFY{$v});}
if ( $t eq "time" ) { $time=$v;$time =~ s/ *hr\.?/:00:00/; }
if ( $t eq "memory" ) { $memory=$v; $memory =~ s/ *(.)bytes?/$1b/;}
if ( $t eq "nodes" ) { $nodes=$v;}
if ( $t eq "cpus" ) { $cpus=$v;}
if ( $t eq "shell" ) { $shell=$v;}
if ( $t eq "queue" ) { $queue=$v;}
if ( $t eq "samples" ) { $sample=$v;}
if ( $t eq "commands" ) { $commands=$v;}
}
$np=$cpus;
if ( $cpus ne 0 ) {
$cpus_per_node=4;
$cpu_remainder=$cpus % $cpus_per_node;
$nodes += ($cpus-$cpu_remainder)/$cpus_per_node;
$cpus = $cpu_remainder;
if ( $nodes != 0 && $cpu_remainder !=0) {
# Syntax is like nodes=4:ppn=2+1:ppn=1 for 9 cpus.
$node_req='nodes=' . "$nodes" . ':ppn=' . $cpus_per_node .
'+1:ppn=' . $cpu_remainder;
}elsif ( $cpu_remainder == 0 ) {
# Syntax is like nodes=4:ppn=2 for 8 cpus.
$node_req='nodes=' . "$nodes" . ':ppn=' . $cpus_per_node;
}else {
# Syntax is like nodes=1:ppn=1 for 1 cpu.
$node_req='nodes=1:ppn=' . $cpu_remainder;
}
}
$SCRIPT = '#!/bin/' . $shell . "\n\n" . $Instructions;
#if ( $#NOTIFY_LIST >= 0 )
# {$SCRIPT .= join("\n", at NOTIFY_LIST , "\n")};
#if ( $QACCESS{$queue} eq "restricted" )
# {$SCRIPT .= $Comments_for_restricted_queues; }
#$SCRIPT .= "$QCOMMENT{$queue} \n";
#$SCRIPT .= '#$ -G ' . $queue . " 1 \n\n";
##NQE retired $SCRIPT .= '#QSUB -lt ' . $time . "\n";
##NQE retired $SCRIPT .= '#QSUB -lM ' . $memory . "\n";
###if ( $sample =~ /^MPI|^PVM|^OPEN_MP/i && $processors == 1 ) {
##if ( $processors == 1 ) {
#if ( $sample =~ /^MPI|^PVM|^OPEN_MP/i && ($nodes != 2) ) {
#if ( $sample =~ /^MPI|^PVM|^OPEN_MP/i ) {
# $SCRIPT .= "### The parallel sample requires 1 nodes to run.\n" ;
# $SCRIPT .= "### Changing to 4 nodes: four processors.\n";
# $nodes = 2;
# $node_req='nodes=1:ppn=4';
# }
##NQE retired $SCRIPT .= '#QSUB -l mpp_p=' . $processors . "\n\n";
$cput=&cnvt_to_cput(4,$time);
#$SCRIPT .= '#PBS -l' . "mem=$memory,nodes=$nodes:ppn=2,cput=$cput,walltime=$time\n\n";
# Use $node_req created above, which allows things like nodes=4:ppn=2+1:ppn=1
$SCRIPT .= '#PBS -l' . "mem=$memory,$node_req,cput=$cput,walltime=$time\n\n";
$SCRIPT .= "# Change to directory from which qsub command was issued \n" ;
$SCRIPT .= ' cd $PBS_O_WORKDIR' . "\n\n";
if ( $sample eq "My_commands" ) { $SCRIPT .= "$commands \n";}
else {
$sample_file="../htdocs/HPC4/SAMPLES/" . $SAMPLES{$sample};
open(FHtty,"<$sample_file") || warn ("Can't open $sample_file for read\n");
# append sample script to above NQS commands.
@A=<FHtty>; $SCRIPT .= join("", at A) . "\n" ;
}
$SCRIPT =~ s/\-np 4/-np $np/g;
#$SCRIPT =~ s/NPS/$np/g;
print "<PRE>\n";
print $SCRIPT;
print "</PRE>\n";
#-----------------------------------------------------------------------
# Routine to convert processor* wall-time list into a total cputime in HH:MM:SS format
sub cnvt_to_cput{
local($processors,$time) = @_;
local(@T,$k,$carry,$cput);
@T=split(/:/,$time);
$carry=0.;
foreach $k (reverse(0..$#T) ) {
$T[$k] = $T[$k]*$processors*1.+$carry;
if ($k > 0 && $T[$k] > 59 ) {
$carry = int($T[$k]/60.);
$T[$k] -= $carry*60.;
}
}
$cput=join(':', at T);
# Avoid returns like 10:0:0, instead make them 10:00:00 for looks.
$cput =~ s/:0:/:00:/;
$cput =~ s/:0$/:00/;
return($cput);
}
sub get_args {
local (@args) = ();
local ($method,$length,$qstring);
local ($string,$name,$value);
$method = $ENV{'REQUEST_METHOD'};
$length = $ENV{'CONTENT_LENGTH'};
$qstring = $ENV{'QUERY_STRING'};
if ( $method eq "GET" ) {
$string = $qstring
}
elsif ( $method eq "POST" ) {
sysread(STDIN,$string,$length)
|| &reject("$0: Cannot read $length bytes from STDIN\n");
}
else {
# guess request method - good for testing on command line
if ( length($qstring) > 0 && $length > 0 ) {
&reject("$0: Cannot guess request method. " .
"Both QUERY_STRING and CONTENT_LENGTH are set.\n");
}
elsif ( length($qstring) > 0 ) {
# probably a GET script
$string = $qstring;
}
elsif ( $length > 0 ) {
# probably a POST script
sysread(STDIN,$string,$length)
|| &reject("$0: Cannot read $length bytes from STDIN\n");
}
else {
&reject("<PRE>$0: unknown REQUEST_METHOD - $method</PRE>\n");
}
}
foreach $temp ( split('&',$string) ) {
($name,$value) = split('=',$temp,2);
$value =~ tr/+/ /;
$value = &unescape_hex($value);
push (@args,"$name=$value");
$IN{$name} = $value;
}
return @args;
}
#-----------------------------------------------------------------------
sub unescape_hex {
local($s) = @_;
$s =~ s/%(..)/pack('c',hex($1))/ge;
return $s;
}
#-----------------------------------------------------------------------
sub txt2html {
local($txt)=@_;
study $txt;
$txt =~ s/&/&/g;
$txt =~ s/</</g;
$txt =~ s/>/>/g;
return $txt;
}
#-----------------------------------------------------------------------
sub reject {
print join(' ', at _);
exit;
}
#-----------------------------------------------------------------------
sub get_add_alias {
$setup_add_alias = "# Set up add alias for use in NQS script\n"
. "alias add 'set aenv = `attach -c \!*` && eval $aenv ; unset aenv'"
. "\n\n";
return( $setup_add_alias );
}
#-----------------------------------------------------------------------
# Local Variables:
# mode: perl
# End:
__END__
More information about the torqueusers
mailing list