[torqueusers] Slave Node Issues - 10 Node Brand new Cluster - Jobs not completing

Joseph De Nicolo j.denicolo at neu.edu
Mon Feb 3 15:35:03 MST 2014


After adjusting the NFSD count to 64, we are still having some
communication issues but they seem to be more specific now so hopefully
somebody can give some insight on a more exact solution. Here are a couple
of scenarios of job submissions and the current issues at hand:

During both of these tests, no other jobs or processes were running on any
node as they were all free.

It seems the NFS overhead is way too high even in this simple scenario:
1. A single job process was spawned on the torque server (network file
system mounted locally), just simple writing to a file which completed in
under 10 seconds.
2. The same single job process was spawned on a torque MOM node (network
file system mounted via NFSv4), took 10 minutes.

Another scenario is showing possible collision or halting of other jobs
when there are no priorities set:
1. 10 jobs spawned on torque mom (child node 1) - they were all running
concurrently and completed in a reasonable time.
2. While the original 10 jobs were running again, this time we spawned 10
more on a different torque mom (child node 2).
3. The jobs were all in the same queue, run by the same person, so no
priority factor.
4. When the 10 new jobs on child 2 were spawned, it affected the run times
of the jobs on child 01 and nothing completed. They were bouncing back and
forth from "R" status to "D" in ps aux, and there was a lot of IO wait.
Even when only 20 jobs running on a 10 node cluster with 64 NFSD daemons.

Could this be a general network issue? Does NFSv4 have to be configured in
more depth to possible allow bigger block sizes? Any help or ideas on the
matter would be of great help. Thanks all!


*Joseph De Nicolo*
*Systems & Data Administrator*
*Center for Complex Network Research <http://www.barabasilab.com>*


*Northeastern University*


On Thu, Jan 30, 2014 at 12:58 PM, <rf at q-leap.de> wrote:

> >>>>> "Joseph" == Joseph De Nicolo <j.denicolo at neu.edu> writes:
>
> Hi Joseph,
>
>     Joseph> Thank you everybody for all the tips.  After some analysis I
>     Joseph> think the root of the problem is with NFS. Using iostat I
>     Joseph> can see some I/O wait% of average 10%.  We ran a test job on
>     Joseph> the head node where the storage is directly attached, and
>     Joseph> the job had "running" status and completed in an appropriate
>     Joseph> amount of time.
>     Joseph>  Running the same job on a child node resulted in the job
>     Joseph>  being flagged as
>     Joseph> "D" - uninterrupted sleep. Note there were other jobs
>     Joseph> currently running on the cluster using up I/O. The job only
>     Joseph> wrote 17Mb but on the head node it took 30 seconds.. while
>     Joseph> the child node was still showing "D" status in "ps" after 25
>     Joseph> minutes.
>
>     Joseph> This is the first cluster I ever built. After reading up on
>     Joseph> NFS, I realized a default NFS server only spawns 8 nfsd
>     Joseph> processes to handle I/O requests and that you should raise
>     Joseph> this number. Do you think this is the root of the problem?
>     Joseph> Can anybody advise me on how to raise the number of nfsd
>     Joseph> spawns for a NFSv4 server on ubuntu 12.04? Also what is a
>     Joseph> good number for a cluster that is 10 nodes, 132 cores.
>
> Edit /etc/default/nfs-kernel-server and adjust
>
> RPCNFSDCOUNT
>
> Afterwards:
>
> $ /etc/init.d/nfs-kernel-server restart
>
> I'd say try 32 threads to start with. Increase in steps of 32 until things
> look better. Of course it might also turn out that NFS is not up to the
> job at
> all, which would have to make you think about Lustre etc. Really
> depends what your applications do.
>
> As an Ubuntu fan, you might find Qlustar of interest to you.
>
> Best,
>
> Roland
>
> ----
> Roland Fehrenbacher, PhD
> Founder/CEO
> Q-Leap Networks GmbH
> Tel. : +49(0)7034/277620
> EMail: rf at q-leap.com
> http://www.q-leap.com / http://qlustar.com
>
>     Joseph> *Joseph De Nicolo* *Systems & Data Administrator* *Center
>     Joseph> for Complex Network Research <http://www.barabasilab.com>*
>
>
>     Joseph> *Northeastern University*
>
>
>     Joseph> On Wed, Jan 29, 2014 at 2:43 PM, Michael Jennings
>     Joseph> <mej at lbl.gov> wrote:
>
>     >> On Tue, Jan 28, 2014 at 9:30 AM, Joseph De Nicolo
>     >> <j.denicolo at neu.edu> wrote:
>     >>
>     >> > Thanks for all the tips on how to get to the bottom of the
>     >> > issues. Here
>     >> is just a trivial test and one of the errors we are receiving
>     >> with the cluster:
>     >> >
>     >> > echo "abc" > abc.txt | qsub -l
>     >> > nodes=mobs-child04,walltime=24:00:00
>     >> >
>     >> > The file abc.txt was correctly written;
>     >>
>     >> Yes, because your command created it.  echo "abc" > abc.txt
>     >> creates abc.txt immediately.  The fact that the file exists has
>     >> nothing to do with TORQUE or whether or not your job ran.
>     >>
>     >> What you probably intended was:
>     >>
>     >> echo 'echo "abc" > abc.txt' | qsub -l
>     >> nodes=mobs-child04,walltime=24:00:00
>     >>
>     >> > This was reported by one of my users.. I just ran the same
>     >> > exact test on
>     >> a different node:
>     >> > echo "xyz" > test.txt | qsub -l
>     >> > nodes=mobs-child01,walltime=24:00:00
>     >> >
>     >> > the file test.txt was correctly written with contents "xyz" but
>     >> > my job
>     >> is still listed in "qstat" with a "Q" state as if it was never
>     >> written. I did not receive any STDIN file as well.
>     >>
>     >> Again, this is because the file creation is happening when you
>     >> run the command.  Nothing to do with the queuing system.
>     >>
>     >> Unfortunately, that also means nothing here is relevant to
>     >> addressing whatever problems you're having with TORQUE.  But from
>     >> the sounds of it, your jobs are overloading the nodes.  Try a
>     >> sleep command instead (for, say, 86400 seconds).
>     >>
>     >> Michael
>     >>
>     >> -- Michael Jennings <mej at lbl.gov> Senior HPC Systems Engineer
>     >> High-Performance Computing Services Lawrence Berkeley National
>     >> Laboratory Bldg 50B-3209E W: 510-495-2687 MS 050B-3209 F:
>     >> 510-486-8615 _______________________________________________
>     >> torqueusers mailing list torqueusers at supercluster.org
>     >> http://www.supercluster.org/mailman/listinfo/torqueusers
>     >>
>     Joseph> _______________________________________________ torqueusers
>     Joseph> mailing list torqueusers at supercluster.org
>     Joseph> http://www.supercluster.org/mailman/listinfo/torqueusers
>
> --
> ----
> Dr. Roland Fehrenbacher
> Geschäftsführer
>
> Q-Leap Networks GmbH
> Königstrasse 17/3
> D-71139 Ehningen
> Tel. : +49(0)7034/277620
> Fax  : +49(0)7034/652836
> EMail: rf at q-leap.de
> http://www.q-leap.de
>
> Handelsregister Amtsgericht Stuttgart HRB 245373
> St.-Nr. 56/464/05060
> USt-IdNr. DE220607026
> Geschäftsführer:  Dr. Roland Fehrenbacher
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20140203/26ea0d92/attachment-0001.html 


More information about the torqueusers mailing list