[torqueusers] Slave Node Issues - 10 Node Brand new Cluster - Jobs not completing

Joseph De Nicolo j.denicolo at neu.edu
Thu Feb 6 09:59:03 MST 2014


All of the NICs on each node is set to 10/100/1000 and full-duplex.
However, could "Auto-negotiation = on" cause the issue? Anybody have any
experience with changing default settings on a switch for a cluster? Does
this auto-negotiation cause a lot of network overhead? I'm guessing I
should just manually set the speeds of each NIC and switch.


*Joseph De Nicolo*
*Systems & Data Administrator*
*Center for Complex Network Research <http://www.barabasilab.com>*


*Northeastern University*


On Wed, Feb 5, 2014 at 5:45 PM, Moye,Roger V <RVMoye at mdanderson.org> wrote:

>  While the cluster is idle try pinging every compute node from the server
> node and check for packet loss.   Assuming you find none, re-run your test
> scenario again and then ping again.   See if any nodes show packet loss.
> If so, you have a network problem.   One common thing, though I confess
> I’ve not seen this lately, would be that a NIC on one or more of the nodes
> (or perhaps the server) has negotiated to the wrong speed, such as the NIC
> being at half-duplex and the switch being at full-duplex.  This will wreck
> things.   You might not see the problem when the network is idle but you
> will definitely notice when the network is busy.
>
>
>
> -Roger
>
>
>
>
>
> -----------------------------------------------------------
>
> Roger V. Moye
>
> Systems Analyst III
>
> XSEDE Campus Champion
>
> University of Texas - MD Anderson Cancer Center
>
> Division of Quantitative Sciences
>
> Pickens Academic Tower - FCT4.6109
>
> Houston, Texas
>
> (713) 792-2134
>
> -----------------------------------------------------------
>
>
>
> *From:* torqueusers-bounces at supercluster.org [mailto:
> torqueusers-bounces at supercluster.org] *On Behalf Of *Joseph De Nicolo
> *Sent:* Monday, February 03, 2014 4:35 PM
> *To:* rf at q-leap.de; Torque Users Mailing List
> *Subject:* Re: [torqueusers] Slave Node Issues - 10 Node Brand new
> Cluster - Jobs not completing
>
>
>
> After adjusting the NFSD count to 64, we are still having some
> communication issues but they seem to be more specific now so hopefully
> somebody can give some insight on a more exact solution. Here are a couple
> of scenarios of job submissions and the current issues at hand:
>
> During both of these tests, no other jobs or processes were running on any
> node as they were all free.
>
>
>
> It seems the NFS overhead is way too high even in this simple scenario:
>
> 1. A single job process was spawned on the torque server (network file
> system mounted locally), just simple writing to a file which completed in
> under 10 seconds.
>
> 2. The same single job process was spawned on a torque MOM node (network
> file system mounted via NFSv4), took 10 minutes.
>
>
>
> Another scenario is showing possible collision or halting of other jobs
> when there are no priorities set:
>
> 1. 10 jobs spawned on torque mom (child node 1) - they were all running
> concurrently and completed in a reasonable time.
>
> 2. While the original 10 jobs were running again, this time we spawned 10
> more on a different torque mom (child node 2).
>
> 3. The jobs were all in the same queue, run by the same person, so no
> priority factor.
>
> 4. When the 10 new jobs on child 2 were spawned, it affected the run times
> of the jobs on child 01 and nothing completed. They were bouncing back and
> forth from "R" status to "D" in ps aux, and there was a lot of IO wait.
> Even when only 20 jobs running on a 10 node cluster with 64 NFSD daemons.
>
> Could this be a general network issue? Does NFSv4 have to be configured in
> more depth to possible allow bigger block sizes? Any help or ideas on the
> matter would be of great help. Thanks all!
>
>
>
>
> *Joseph De Nicolo*
> *Systems & Data Administrator*
>
> *Center for Complex Network Research <http://www.barabasilab.com>*
>
>
> *Northeastern University [image: Image removed by sender.]*
>
>
>
> On Thu, Jan 30, 2014 at 12:58 PM, <rf at q-leap.de> wrote:
>
> >>>>> "Joseph" == Joseph De Nicolo <j.denicolo at neu.edu> writes:
>
> Hi Joseph,
>
>     Joseph> Thank you everybody for all the tips.  After some analysis I
>     Joseph> think the root of the problem is with NFS. Using iostat I
>     Joseph> can see some I/O wait% of average 10%.  We ran a test job on
>     Joseph> the head node where the storage is directly attached, and
>     Joseph> the job had "running" status and completed in an appropriate
>     Joseph> amount of time.
>     Joseph>  Running the same job on a child node resulted in the job
>     Joseph>  being flagged as
>     Joseph> "D" - uninterrupted sleep. Note there were other jobs
>     Joseph> currently running on the cluster using up I/O. The job only
>     Joseph> wrote 17Mb but on the head node it took 30 seconds.. while
>     Joseph> the child node was still showing "D" status in "ps" after 25
>     Joseph> minutes.
>
>     Joseph> This is the first cluster I ever built. After reading up on
>     Joseph> NFS, I realized a default NFS server only spawns 8 nfsd
>     Joseph> processes to handle I/O requests and that you should raise
>     Joseph> this number. Do you think this is the root of the problem?
>     Joseph> Can anybody advise me on how to raise the number of nfsd
>     Joseph> spawns for a NFSv4 server on ubuntu 12.04? Also what is a
>     Joseph> good number for a cluster that is 10 nodes, 132 cores.
>
> Edit /etc/default/nfs-kernel-server and adjust
>
> RPCNFSDCOUNT
>
> Afterwards:
>
> $ /etc/init.d/nfs-kernel-server restart
>
> I'd say try 32 threads to start with. Increase in steps of 32 until things
> look better. Of course it might also turn out that NFS is not up to the
> job at
> all, which would have to make you think about Lustre etc. Really
> depends what your applications do.
>
> As an Ubuntu fan, you might find Qlustar of interest to you.
>
> Best,
>
> Roland
>
> ----
> Roland Fehrenbacher, PhD
> Founder/CEO
> Q-Leap Networks GmbH
> Tel. : +49(0)7034/277620
> EMail: rf at q-leap.com
> http://www.q-leap.com / http://qlustar.com
>
>     Joseph> *Joseph De Nicolo* *Systems & Data Administrator* *Center
>     Joseph> for Complex Network Research <http://www.barabasilab.com>*
>
>
>     Joseph> *Northeastern University*
>
>
>     Joseph> On Wed, Jan 29, 2014 at 2:43 PM, Michael Jennings
>
>     Joseph> <mej at lbl.gov> wrote:
>
>     >> On Tue, Jan 28, 2014 at 9:30 AM, Joseph De Nicolo
>     >> <j.denicolo at neu.edu> wrote:
>     >>
>     >> > Thanks for all the tips on how to get to the bottom of the
>     >> > issues. Here
>     >> is just a trivial test and one of the errors we are receiving
>     >> with the cluster:
>     >> >
>     >> > echo "abc" > abc.txt | qsub -l
>     >> > nodes=mobs-child04,walltime=24:00:00
>     >> >
>     >> > The file abc.txt was correctly written;
>     >>
>     >> Yes, because your command created it.  echo "abc" > abc.txt
>     >> creates abc.txt immediately.  The fact that the file exists has
>     >> nothing to do with TORQUE or whether or not your job ran.
>     >>
>     >> What you probably intended was:
>     >>
>     >> echo 'echo "abc" > abc.txt' | qsub -l
>     >> nodes=mobs-child04,walltime=24:00:00
>     >>
>     >> > This was reported by one of my users.. I just ran the same
>     >> > exact test on
>     >> a different node:
>     >> > echo "xyz" > test.txt | qsub -l
>     >> > nodes=mobs-child01,walltime=24:00:00
>     >> >
>     >> > the file test.txt was correctly written with contents "xyz" but
>     >> > my job
>     >> is still listed in "qstat" with a "Q" state as if it was never
>     >> written. I did not receive any STDIN file as well.
>     >>
>     >> Again, this is because the file creation is happening when you
>     >> run the command.  Nothing to do with the queuing system.
>     >>
>     >> Unfortunately, that also means nothing here is relevant to
>     >> addressing whatever problems you're having with TORQUE.  But from
>     >> the sounds of it, your jobs are overloading the nodes.  Try a
>     >> sleep command instead (for, say, 86400 seconds).
>     >>
>     >> Michael
>     >>
>     >> -- Michael Jennings <mej at lbl.gov> Senior HPC Systems Engineer
>     >> High-Performance Computing Services Lawrence Berkeley National
>     >> Laboratory Bldg 50B-3209E W: 510-495-2687 MS 050B-3209 F:
>     >> 510-486-8615 _______________________________________________
>     >> torqueusers mailing list torqueusers at supercluster.org
>     >> http://www.supercluster.org/mailman/listinfo/torqueusers
>     >>
>
>     Joseph> _______________________________________________ torqueusers
>     Joseph> mailing list torqueusers at supercluster.org
>     Joseph> http://www.supercluster.org/mailman/listinfo/torqueusers
>
> --
> ----
> Dr. Roland Fehrenbacher
> Geschäftsführer
>
> Q-Leap Networks GmbH
> Königstrasse 17/3
> D-71139 Ehningen
> Tel. : +49(0)7034/277620
> Fax  : +49(0)7034/652836
> EMail: rf at q-leap.de
> http://www.q-leap.de
>
> Handelsregister Amtsgericht Stuttgart HRB 245373
> St.-Nr. 56/464/05060
> USt-IdNr. DE220607026
> Geschäftsführer:  Dr. Roland Fehrenbacher
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20140206/1d6a8a7d/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 823 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20140206/1d6a8a7d/attachment-0001.jpe 


More information about the torqueusers mailing list