[torqueusers] low network utilization
Mahmood Naderan
nt_mahmood at yahoo.com
Sat Oct 20 07:23:07 MDT 2012
Really sorry for the inconvenience... I did a mistake in my previous reply.
So the iostat output was incorrect. Please ignore that.
I ran the simulation again. The true configuration is:
1- The application is run on the compute node
2- I ran "iostat 1" on the server. While it is printing every second,
I run the application on the compute node and terminate it.
3- I run "top" on the compute node.
The iostat output looks like:
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 7.00 0.00 140.00 0 140
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 1.13 0.00 0.00 98.87
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 29.00 0.00 128.00 0 128
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.00 0.25 0.00 0.00 99.69
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.00 0.63 0.00 0.00 99.31
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 1.00 0.00 0.00 99.00
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 1.50 0.00 0.00 98.50
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.00 0.75 0.38 0.00 98.81
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 9.00 0.00 164.00 0 164
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.00 1.32 0.00 0.00 98.62
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.00 1.07 0.00 0.00 98.87
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.63 0.00 0.00 99.37
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.00 2.51 0.00 0.00 97.42
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 2.14 0.13 0.00 97.74
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 8.00 0.00 76.00 0 76
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.00 0.19 0.00 0.00 99.75
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
Also the top output during the execution of application on the compute node:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10570 mahm 20 0 296m 92m 14m D 42 0.1 0:06.13 atIco
10568 mahm 20 0 298m 94m 14m R 32 0.1 0:04.02 atIco
10567 mahm 20 0 296m 92m 14m D 23 0.1 0:04.41 atIco
10569 mahm 20 0 298m 93m 14m D 21 0.1 0:03.63 atIco
Any feedback is appreciated.
Regards,
Mahmood
________________________________
From: Mahmood Naderan <nt_mahmood at yahoo.com>
To: Jonathan Barber <jonathan.barber at gmail.com>
Cc: torque cluster <torqueusers at supercluster.org>
Sent: Saturday, October 20, 2012 2:56 PM
Subject: Re: [torqueusers] low network utilization
>This sounds like you are generating more IOPS than your storage system
>can deliver, probably because you are doing many small random
>requests.
The cluster is diskless so all IO operations are done on the server. I run
"iostat 1" on the server before running the application on the compute node.
As you can see, the average user cpu usage is 0%, then it goes to 23%
and then goes to 0% which means I terminate the application on the node.
Thing is, the read/write operations per second is almost zero during the
application run. So I wonder why cpu user on server is 20%.
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 4.00 0.00 68.00 0 68
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0
0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu:
%user %nice %system %iowait %steal %idle
0.50 0.00 0.19 0.00 0.00 99.31
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00
0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00
0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
21.81 0.00 2.89 0.00 0.00 75.30
Device: tps kB_read/s kB_wrtn/s kB_read
kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0
0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu:
%user %nice %system %iowait %steal %idle
20.93 0.00 4.32 0.00 0.00 74.75
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00
0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00
0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
21.97 0.00 3.20 0.00 0.00 74.83
Device: tps kB_read/s kB_wrtn/s kB_read
kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0
0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu:
%user %nice %system %iowait %steal %idle
21.82 0.00 3.39 0.00 0.00 74.80
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00
0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00
0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
22.49 0.00 2.82 0.00 0.00 74.69
Device: tps kB_read/s kB_wrtn/s kB_read
kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0
0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu:
%user %nice %system %iowait %steal %idle
21.89 0.00 3.26 0.25 0.00 74.59
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 4.00 0.00 88.00 0 88
sdb 0.00 0.00
0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00
0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
21.29 0.00 4.01 0.00 0.00 74.70
Device: tps kB_read/s kB_wrtn/s kB_read
kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0
0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu:
%user %nice %system %iowait %steal %idle
21.96 0.00 3.20 0.00 0.00 74.84
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00
0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00
0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
22.07 0.00 3.13 0.00 0.00 74.80
Device: tps kB_read/s kB_wrtn/s kB_read
kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0
0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu:
%user %nice %system %iowait %steal %idle
22.35 0.00 2.82 0.00 0.00 74.83
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00
0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00
0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
22.15 0.00 3.01 0.00 0.00 74.84
Device: tps kB_read/s kB_wrtn/s kB_read
kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0
0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu:
%user %nice %system %iowait %steal %idle
21.88 0.00 3.39 0.00 0.00 74.73
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00
0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00
0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
21.97 0.00 3.14 0.00 0.00 74.89
Device: tps kB_read/s kB_wrtn/s kB_read
kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0
0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu:
%user %nice %system %iowait %steal %idle
21.87 0.00 3.38 0.00 0.00 74.75
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00
0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00
0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
22.13 0.00 3.07 0.00 0.00 74.80
Device: tps kB_read/s kB_wrtn/s kB_read
kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0
0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu:
%user %nice %system %iowait %steal %idle
0.63 0.00 0.69 0.00 0.00 98.68
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00
0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00
0.00 0 0
dm-0 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.00 0.00 0.00 0.00 99.94
Regards,
Mahmood
________________________________
From: Jonathan Barber <jonathan.barber at gmail.com>
To: Mahmood Naderan <nt_mahmood at yahoo.com>; Torque Users Mailing List <torqueusers at supercluster.org>
Sent: Thursday, October 18, 2012 11:09 AM
Subject: Re: [torqueusers] low network utilization
On 17 October 2012 20:32, Mahmood Naderan <nt_mahmood at yahoo.com> wrote:
> Dear all,
> I have noticed that when I submit a job on a working node, the network speed
> is about 20Mb. That is quite slow because the switch speed is 1000Mb. That
> causes the processes to be in "D" state and the cpu usages are much below
> 100%.
This sounds like you are generating more IOPS than your storage system
can deliver, probably because you are doing many small random
requests.
You should first check that the server NIC and the switch port are
both running at 1GbE (using "ethtool" on the host and connecting to
the switch and verifying the port status).
On the NFS server (assuming linux) check the block device that
supports the NFS exported file system with "iostat -kx 1". If you have
~100% in the "%util" column
then you are limited by the storage
system.
You can monitor the host network throughput with "iftop" (assuming linux).
You can get a crude idea of your baseline NFS performance by using dd
with large (larger than the largest amount of memory available to the
server and client) files and reading / writing them from the client.
For better measurements, I suggest fio:
http://freecode.com/projects/fio
although it is a lot more complicated to interpret the results.
Cheers
> I thought there is a problem with NFS however the stats shows about 1.3k
> requests per second which is not really high.
> Maybe Torque transfers data (from worker to server which has disks) quickly.
>
> How can I investigate more?
>
> Regards,
> Mahmood
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
--
Jonathan Barber <jonathan.barber at gmail.com>
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20121020/c0964d2c/attachment-0001.html
More information about the torqueusers
mailing list