[torqueusers] Fluent Job submission

I.Kureshi U0850037 U0850037 at hud.ac.uk
Thu Mar 11 13:57:11 MST 2010


Thanks jerry and james,

literally 5 minutes before i saw your emails i realized just that. Since PBS spawns the host node on an internal node that node cant see the license server because the internal network can not connect out.
I enables IPv4 forwarding and set the rules and now it works like a charm. Thanks for the help.


________________________________________
From: Coyle, James J [ITACD] [jjc at iastate.edu]
Sent: Thursday, March 11, 2010 4:34 PM
To: I.Kureshi  U0850037; torqueusers at supercluster.org
Subject: RE: Fluent Job submission

I.Kureshi ,

  We run a similar cluster here, and use Fluent frequently.

License issue:
------------------

  I suspect the problem is that your nodes cannot communicate with
the windows machine that serves the licenses.

  To test that, use qsub -I -l ...
without the script name to get an interactive session and
try the fluent commands there. I suspect that it won't work.
You could then try pinging the license server to see if it is
Even network accessible.  My guiess is that (like us) you are
On a private subnet, or you have a firewall running.  In either
Case, you will need to find a way to allow network access to
The license server machine.  If it is a firewall, you may be able to
add a line like:

-A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

  Then issue /sbin/service iptables restart

To allow responses to requests to come that your machine initiates
to come back.

  If it is a private subnet, you will need to use a gateway machine
(probably the front-end machine)
You do this by issuing the command

/sbin/route add default gw ttt.xxx.yyy.zzz  eth0

On each of your compute nodes, ( you may also need this in /etc/rc3.d/S99local)
where
ttt.xxx.yyy.zzz is the numerical ip address of your gateway machine (the login node)
and on the  login node, you may need to issue:

/sbin/sysctl net.ipv4.ip_forward=1

To get this set on each boot, Change the line
net.ipv4.ip_forward=0
to
net.ipv4.ip_forward=1
in /etc/sysctl.conf  on the login node.


   If someone else has a better way, let me know.

Fluent issue:
--------------

  If you run on multiple nodes, you will need -cnf=nodefile  ,
So use
-cnf=${PNS_NODEFILE}
Opn the fluent command.

If you have IB or myrinet, you may want

To add -pib  or -pmyri
Resp.
To get communication running over the low latency switch as opposed to
the default of ethernet.
We have one cluster of each, and we find huge improvements using these
options when running "small" fluent problems.


Regards,
 James Coyle, PhD
 High Performance Computing Group
 115 Durham Center
 Iowa State Univ.           phone: (515)-294-2099
 Ames, Iowa 50011           web: http://www.public.iastate.edu/~jjc



-----Original Message-----
From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of I.Kureshi U0850037
Sent: Thursday, March 11, 2010 4:30 AM
To: torqueusers at supercluster.org
Subject: [torqueusers] Fluent Job submission

Hi all,

I am a system admin at a UK university and we got a request to provide an HPC resource for fluent.

After installing Fluent on one of Our Cluster which is running CENTOS5.4 with OSCAR5.1b2 and is of the architecture nodes=16 plus a head node we were successfully able to start fluent via the terminal and submit parallel jobs through the shell with a journal file and the -g switch. The University has 45 licenses for Fluent 6.3.26 and 30 licenses for an older version 6.0/2?? (not sure which). These licenses reside on a windows server with flexlm running on it. We have floating licenses for many softwares on that machine.

When I try to submit a job through the job scheduler Torque/MAUI the simulations do not run as there is a license problem, even though it seems to be looking in the right place.

I have posted this on a FLUENT based forum as well but I thought since it seems to be  case of the environment variables created by TORQUE users here might be better help.

I would appreciate any help regarding this. Below are the submission script, the journal file, the output file and the error file respectively.

The job was submitted using qsub fluent.job
______________________________
Submission Script
______________________________
#!/bin/bash
#PBS -S /bin/bash
#PBS -m e
#PBS -M sengik at hud.ac.uk
#PBS -N fluent
#PBS -l nodes=3
#
#PBS -e stderr
#PBS -o stdout
#
#PBS -V
#
fluent 2d -g -ssh -t3 -i /home/sengik/Desktop/test/input.in
______________________________
Journal File
______________________________
file/read-case /home/sengik/Desktop/test/2dcar_10.cas
solve/initialize/initialize-flow
file/write-data /home/sengik/Desktop/test/2dcar_10.dat
exit
yes
#as you can see just a simple case of load initialise save and exit

______________________________
Output File
______________________________

/usr/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 2d -g -ssh -t3 -i /home/sengik/Desktop/test/input.in
Loading "/usr/Fluent.Inc/fluent6.3.26/lib/fluent.dmp.114-32"
Done.
/usr/Fluent.Inc/fluent6.3.26/bin/fluent -r6.3.26 2d -pethernet -host -alnx86 -t3 -mpi=hp -path/usr/Fluent.Inc -ssh -cx node16.testbed-CLS:56711:56126

Server node is down or not responding
See the system adminstrator about starting the server, or
make sure the you're referring to the right host (see LM_LICENSE_FILE)
Feature: fluent
Hostname: mech1
License path: 7241 at mech1:/usr/Fluent.Inc/license/lnx86/../license.dat
FLEXlm error: -96,7. System Error: 11 "Resource temporarily unavailable"
For further information, refer to the FLEXlm End User Manual,
available at "www.macrovision.com".

______________________________
Error File
______________________________
/usr/Fluent.Inc/fluent6.3.26/bin/fluent: line 2397: glxinfo: command not found
/usr/Fluent.Inc/fluent6.3.26/cortex/lnx86/cortex.3.7.3 -f fluent -g -i /home/sengik/Desktop/test/input.in (fluent "2d -pethernet -host -alnx86 -r6.3.26 -t3 -mpi=hp -path/usr/Fluent.Inc -ssh")
Starting /usr/Fluent.Inc/fluent6.3.26/lnx86/2d_host/fluent.6.3.26 host -cx node16.testbed-CLS:56711:56126 "(list (rpsetvar (QUOTE parallel/function) "fluent 2d -node -alnx86 -r6.3.26 -t3 -pethernet -mpi=hp -ssh") (rpsetvar (QUOTE parallel/rhost) "") (rpsetvar (QUOTE parallel/ruser) "") (rpsetvar (QUOTE parallel/nprocs_string) "3") (rpsetvar (QUOTE parallel/auto-spawn?) #t) (rpsetvar (QUOTE parallel/trace-level) 0) (rpsetvar (QUOTE parallel/remote-shell) 1) (rpsetvar (QUOTE parallel/path) "/usr/Fluent.Inc") (rpsetvar (QUOTE parallel/hostsfile) "") )"

Welcome to Fluent 6.3.26

Copyright 2006 Fluent Inc.
All Rights Reserved

Loading "/usr/Fluent.Inc/fluent6.3.26/lib/flprim.dmp.1119-32"
Done.

Unexpected license problem; exiting.

______________________________________

The simple line:
fluent 2d -g -ssh -t3 -cnf=<hostfile> -i /home/sengik/Desktop/test/input.in
works perfectly fine.

EDIT: TORQUE allocates the nodes correctly and is working fine. the simulation just ends when the license error occurs.


Thanks in advance for the help


---
This transmission is confidential and may be legally privileged. If you receive it in error, please notify us immediately by e-mail and remove it from your system. If the content of this e-mail does not relate to the business of the University of Huddersfield, then we do not endorse it and will accept no liability.
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


---
This transmission is confidential and may be legally privileged. If you receive it in error, please notify us immediately by e-mail and remove it from your system. If the content of this e-mail does not relate to the business of the University of Huddersfield, then we do not endorse it and will accept no liability.


More information about the torqueusers mailing list