[torqueusers] Fluent Job submission
Coyle, James J [ITACD]
jjc at iastate.edu
Thu Mar 11 09:34:29 MST 2010
We run a similar cluster here, and use Fluent frequently.
I suspect the problem is that your nodes cannot communicate with
the windows machine that serves the licenses.
To test that, use qsub -I -l ...
without the script name to get an interactive session and
try the fluent commands there. I suspect that it won't work.
You could then try pinging the license server to see if it is
Even network accessible. My guiess is that (like us) you are
On a private subnet, or you have a firewall running. In either
Case, you will need to find a way to allow network access to
The license server machine. If it is a firewall, you may be able to
add a line like:
-A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
Then issue /sbin/service iptables restart
To allow responses to requests to come that your machine initiates
to come back.
If it is a private subnet, you will need to use a gateway machine
(probably the front-end machine)
You do this by issuing the command
/sbin/route add default gw ttt.xxx.yyy.zzz eth0
On each of your compute nodes, ( you may also need this in /etc/rc3.d/S99local)
ttt.xxx.yyy.zzz is the numerical ip address of your gateway machine (the login node)
and on the login node, you may need to issue:
To get this set on each boot, Change the line
in /etc/sysctl.conf on the login node.
If someone else has a better way, let me know.
If you run on multiple nodes, you will need -cnf=nodefile ,
Opn the fluent command.
If you have IB or myrinet, you may want
To add -pib or -pmyri
To get communication running over the low latency switch as opposed to
the default of ethernet.
We have one cluster of each, and we find huge improvements using these
options when running "small" fluent problems.
James Coyle, PhD
High Performance Computing Group
115 Durham Center
Iowa State Univ. phone: (515)-294-2099
Ames, Iowa 50011 web: http://www.public.iastate.edu/~jjc
From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of I.Kureshi U0850037
Sent: Thursday, March 11, 2010 4:30 AM
To: torqueusers at supercluster.org
Subject: [torqueusers] Fluent Job submission
I am a system admin at a UK university and we got a request to provide an HPC resource for fluent.
After installing Fluent on one of Our Cluster which is running CENTOS5.4 with OSCAR5.1b2 and is of the architecture nodes=16 plus a head node we were successfully able to start fluent via the terminal and submit parallel jobs through the shell with a journal file and the -g switch. The University has 45 licenses for Fluent 6.3.26 and 30 licenses for an older version 6.0/2?? (not sure which). These licenses reside on a windows server with flexlm running on it. We have floating licenses for many softwares on that machine.
When I try to submit a job through the job scheduler Torque/MAUI the simulations do not run as there is a license problem, even though it seems to be looking in the right place.
I have posted this on a FLUENT based forum as well but I thought since it seems to be case of the environment variables created by TORQUE users here might be better help.
I would appreciate any help regarding this. Below are the submission script, the journal file, the output file and the error file respectively.
The job was submitted using qsub fluent.job
#PBS -S /bin/bash
#PBS -m e
#PBS -M sengik at hud.ac.uk
#PBS -N fluent
#PBS -l nodes=3
#PBS -e stderr
#PBS -o stdout
fluent 2d -g -ssh -t3 -i /home/sengik/Desktop/test/input.in
#as you can see just a simple case of load initialise save and exit
/usr/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 2d -g -ssh -t3 -i /home/sengik/Desktop/test/input.in
/usr/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 2d -pethernet -host -alnx86 -t3 -mpi=hp -path/usr/Fluent.Inc -ssh -cx node16.testbed-CLS:56711:56126
Server node is down or not responding
See the system adminstrator about starting the server, or
make sure the you're referring to the right host (see LM_LICENSE_FILE)
License path: 7241 at mech1:/usr/Fluent.Inc/license/lnx86/../license.dat
FLEXlm error: -96,7. System Error: 11 "Resource temporarily unavailable"
For further information, refer to the FLEXlm End User Manual,
available at "www.macrovision.com".
/usr/Fluent.Inc/fluent6.3.26/bin/fluent: line 2397: glxinfo: command not found
/usr/Fluent.Inc/fluent6.3.26/cortex/lnx86/cortex.3.7.3 -f fluent -g -i /home/sengik/Desktop/test/input.in (fluent "2d -pethernet -host -alnx86 -r6.3.26 -t3 -mpi=hp -path/usr/Fluent.Inc -ssh")
Starting /usr/Fluent.Inc/fluent6.3.26/lnx86/2d_host/fluent.6.3.26 host -cx node16.testbed-CLS:56711:56126 "(list (rpsetvar (QUOTE parallel/function) "fluent 2d -node -alnx86 -r6.3.26 -t3 -pethernet -mpi=hp -ssh") (rpsetvar (QUOTE parallel/rhost) "") (rpsetvar (QUOTE parallel/ruser) "") (rpsetvar (QUOTE parallel/nprocs_string) "3") (rpsetvar (QUOTE parallel/auto-spawn?) #t) (rpsetvar (QUOTE parallel/trace-level) 0) (rpsetvar (QUOTE parallel/remote-shell) 1) (rpsetvar (QUOTE parallel/path) "/usr/Fluent.Inc") (rpsetvar (QUOTE parallel/hostsfile) "") )"
Welcome to Fluent 6.3.26
Copyright 2006 Fluent Inc.
All Rights Reserved
Unexpected license problem; exiting.
The simple line:
fluent 2d -g -ssh -t3 -cnf=<hostfile> -i /home/sengik/Desktop/test/input.in
works perfectly fine.
EDIT: TORQUE allocates the nodes correctly and is working fine. the simulation just ends when the license error occurs.
Thanks in advance for the help
This transmission is confidential and may be legally privileged. If you receive it in error, please notify us immediately by e-mail and remove it from your system. If the content of this e-mail does not relate to the business of the University of Huddersfield, then we do not endorse it and will accept no liability.
torqueusers mailing list
torqueusers at supercluster.org
More information about the torqueusers