[torqueusers] Limit NFS IO Speed?

Jeff Anderson-Lee jonah at eecs.berkeley.edu
Thu Nov 8 14:26:29 MST 2012


Read requests should be small compared to the bytes returned, so 1MB of 
read requests may trigger considerably more return data traffic.

Jeff Anderson-Lee

On 11/8/2012 12:58 PM, Mike Dacre wrote:
> Hi Craig,
>
> Yes you are right, it is outbound only.  What I have been trying to do 
> is limit the traffic from the client side.  The settings below are 
> from the individual slave nodes.  It is very difficult to quiet the 
> system as it is being actively used, however I have tried to apply the 
> setting below with very restrictive limits - around 1MB/s per node - 
> without seeing any discernible slowdown of traffic.  I also tried from 
> the server side just to see if it would slow down reads (which are 
> outbound from the server's perspective), also without any effect.
>
> Is there any chance the problem is that the NFS is async?  I know that 
> is a problem for cgroups, but I didn't think it would be a problem 
> here because tc/iptables filtering/mangling is at the packet level.
>
> Thanks,
>
> Mike
>
> On Thursday, November 8, 2012 at 12:34 PM, Craig Tierney wrote:
>
>> Mike,
>>
>> I can't follow all of your settings because I am not an expert in the
>> tc command, however I have played with it. What I remember is that
>> traffic shaping only is outbound. I would quiet the system and run
>> some tests where you do some writes from multiple nodes, then reads.
>> Do they seem throttled at all? Is the performance difference in each
>> direction?
>>
>> Craig
>>
>> On Thu, Nov 8, 2012 at 12:44 PM, Mike Dacre <mike.dacre at gmail.com 
>> <mailto:mike.dacre at gmail.com>> wrote:
>>> Hi Everyone,
>>>
>>> I am having an unusual problem: I have an Infiniband network 
>>> connecting my
>>> nodes and I have a RAID array mounted over NFS on every node. What is
>>> happening is that the nodes are reading/writing too fast to the NFS 
>>> mount,
>>> and the IO of the array is being maxed out, which results in terrible
>>> performance for interactive commands (e.g. ls). I have tried traffic
>>> shaping with iptables and tc on both the server and slave nodes with no
>>> success at all. I am not even certain those commands are working 
>>> properly
>>> on an IPoIB NIC (ib0).
>>>
>>> The TC command I am trying is:
>>>
>>> $TC qdisc add dev ib0 root handle 1:0 htb
>>> $TC class add dev ib0 parent 1:0 classid 1:1 htb rate 50mbps ceil 50mbps
>>> $TC class add dev ib0 parent 1:1 classid 1:2 htb rate 10mbps ceil 20mbps
>>> $TC qdisc add dev ib0 parent 1:2 sfq
>>> $TC filter add dev ib0 parent 1:0 protocol ip u32 match ip sport 2049
>>> 0xffff flowid 1:2
>>> $TC filter add dev ib0 parent 1:0 protocol ip u32 match ip dport 2049
>>> 0xffff flowid 1:2
>>>
>>> or
>>>
>>> $TC qdisc add dev ib0 root handle 1:0 htb
>>> $TC class add dev ib0 parent 1:0 classid 1:1 htb rate 50mbps ceil 50mbps
>>> $TC class add dev ib0 parent 1:1 classid 1:2 htb rate 10mbps ceil 20mbps
>>> $TC qdisc add dev ib0 parent 1:2 sfq
>>> $TC filter add dev ib0 parent 1:0 protocol ip prio 1 handle 6 fw flowid
>>> 1:2
>>>
>>> With the following iptables:
>>>
>>> /sbin/iptables -A POSTROUTING -t mangle -o ib0 -p tcp -m multiport 
>>> --sport
>>> 2049 -j MARK --set-mark 6
>>> /sbin/iptables -A POSTROUTING -t mangle -o ib0 -p tcp -m multiport 
>>> --sport
>>> 2049 -j RETURN
>>> /sbin/iptables -A POSTROUTING -t mangle -o ib0 -p udp -m multiport 
>>> --sport
>>> 2049 -j MARK --set-mark 6
>>> /sbin/iptables -A POSTROUTING -t mangle -o ib0 -p udp -m multiport 
>>> --sport
>>> 2049 -j RETURN
>>> /sbin/iptables -A POSTROUTING -t mangle -o ib0 -p tcp -m multiport 
>>> --dport
>>> 2049 -j MARK --set-mark 6
>>> /sbin/iptables -A POSTROUTING -t mangle -o ib0 -p tcp -m multiport 
>>> --dport
>>> 2049 -j RETURN
>>> /sbin/iptables -A POSTROUTING -t mangle -o ib0 -p udp -m multiport 
>>> --dport
>>> 2049 -j MARK --set-mark 6
>>> /sbin/iptables -A POSTROUTING -t mangle -o ib0 -p udp -m multiport 
>>> --dport
>>> 2049 -j RETURN
>>>
>>> I don't want to go back to ethernet NFS and only use Infiniband for MPI
>>> because I want to maximize IO when there is a single node doing all 
>>> the IO,
>>> and ethernet is just too slow for that.
>>>
>>> Any thoughts?
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20121108/6df1105b/attachment.html 


More information about the torqueusers mailing list