[torqueusers] low network utilization

Jonathan Barber jonathan.barber at gmail.com
Thu Oct 18 03:09:23 MDT 2012


On 17 October 2012 20:32, Mahmood Naderan <nt_mahmood at yahoo.com> wrote:
> Dear all,
> I have noticed that when I submit a job on a working node, the network speed
> is about 20Mb. That is quite slow because the switch speed is 1000Mb.  That
> causes the processes to be in "D" state and the cpu usages are much below
> 100%.

This sounds like you are generating more IOPS than your storage system
can deliver, probably because you are doing many small random
requests.

You should first check that the server NIC and the switch port are
both running at 1GbE (using "ethtool" on the host and connecting to
the switch and verifying the port status).

On the NFS server (assuming linux) check the block device that
supports the NFS exported file system with "iostat -kx 1". If you have
 ~100% in the "%util" column then you are limited by the storage
system.

You can monitor the host network throughput with "iftop" (assuming linux).

You can get a crude idea of your baseline NFS performance by using dd
with large (larger than the largest amount of memory available to the
server and client) files and reading / writing them from the client.

For better measurements, I suggest fio:
http://freecode.com/projects/fio

although it is a lot more complicated to interpret the results.

Cheers

> I thought there is a problem with NFS however the stats shows about 1.3k
> requests per second which is not really high.

> Maybe Torque transfers data (from worker to server which has disks) quickly.
>
> How can I investigate more?
>
> Regards,
> Mahmood
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
Jonathan Barber <jonathan.barber at gmail.com>


More information about the torqueusers mailing list