[torqueusers] low network utilization
Mahmood Naderan
nt_mahmood at yahoo.com
Sat Oct 20 06:56:43 MDT 2012
>This sounds like you are generating more IOPS than your storage system
>can deliver, probably because you are doing many small random
>requests.
The cluster is diskless so all IO operations are done on the server. I run
"iostat 1" on the server before running the application on the compute node.
As you can see, the average user cpu usage is 0%, then it goes to 23%
and then goes to 0% which means I terminate the application on the node.
Thing is, the read/write operations per second is almost zero during the
application run. So I wonder why cpu user on server is 20%.
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 4.00 0.00 68.00 0 68
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.19 0.00 0.00 99.31
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
21.81 0.00 2.89 0.00 0.00 75.30
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
20.93 0.00 4.32 0.00 0.00 74.75
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
21.97 0.00 3.20 0.00 0.00 74.83
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
21.82 0.00 3.39 0.00 0.00 74.80
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
22.49 0.00 2.82 0.00 0.00 74.69
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
21.89 0.00 3.26 0.25 0.00 74.59
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 4.00 0.00 88.00 0 88
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
21.29 0.00 4.01 0.00 0.00 74.70
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
21.96 0.00 3.20 0.00 0.00 74.84
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
22.07 0.00 3.13 0.00 0.00 74.80
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
22.35 0.00 2.82 0.00 0.00 74.83
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
22.15 0.00 3.01 0.00 0.00 74.84
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
21.88 0.00 3.39 0.00 0.00 74.73
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
21.97 0.00 3.14 0.00 0.00 74.89
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
21.87 0.00 3.38 0.00 0.00 74.75
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
22.13 0.00 3.07 0.00 0.00 74.80
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.63 0.00 0.69 0.00 0.00 98.68
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.00 0.00 0.00 0.00 99.94
Regards,
Mahmood
________________________________
From: Jonathan Barber <jonathan.barber at gmail.com>
To: Mahmood Naderan <nt_mahmood at yahoo.com>; Torque Users Mailing List <torqueusers at supercluster.org>
Sent: Thursday, October 18, 2012 11:09 AM
Subject: Re: [torqueusers] low network utilization
On 17 October 2012 20:32, Mahmood Naderan <nt_mahmood at yahoo.com> wrote:
> Dear all,
> I have noticed that when I submit a job on a working node, the network speed
> is about 20Mb. That is quite slow because the switch speed is 1000Mb. That
> causes the processes to be in "D" state and the cpu usages are much below
> 100%.
This sounds like you are generating more IOPS than your storage system
can deliver, probably because you are doing many small random
requests.
You should first check that the server NIC and the switch port are
both running at 1GbE (using "ethtool" on the host and connecting to
the switch and verifying the port status).
On the NFS server (assuming linux) check the block device that
supports the NFS exported file system with "iostat -kx 1". If you have
~100% in the "%util" column then you are limited by the storage
system.
You can monitor the host network throughput with "iftop" (assuming linux).
You can get a crude idea of your baseline NFS performance by using dd
with large (larger than the largest amount of memory available to the
server and client) files and reading / writing them from the client.
For better measurements, I suggest fio:
http://freecode.com/projects/fio
although it is a lot more complicated to interpret the results.
Cheers
> I thought there is a problem with NFS however the stats shows about 1.3k
> requests per second which is not really high.
> Maybe Torque transfers data (from worker to server which has disks) quickly.
>
> How can I investigate more?
>
> Regards,
> Mahmood
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
--
Jonathan Barber <jonathan.barber at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20121020/95578993/attachment-0001.html
More information about the torqueusers
mailing list