Improve NAS performance

Status
Not open for further replies.
Hi guys,

Has any of you had experience with the following and it there an explaination, or better, a way to solve it? Some days ago I completed building my new NAS server. Now, I have what seems to be a very slow file sharing performance that I wish to improve.

  • Locally, if I log on, I can copy/past a 1.5Gbyte file in the same directory and reach 40Mbyte/s full duplex (40Mbyte read, 40Mbyte write). Nice. :p
  • If I download the same file using HTTP I reach a download speed of 35Mbyte/s. Nice.
  • If I copy the same file to my client's harddisk using windows file sharing I reach a speed of 30Mbyte/s (some overhead, ok, but still nice for a NAS).
  • If I run iPerf from the client to the server and put it in duplex mode it performs 30-35Mbyte in both directions simultaneously.
  • If I run NetCPS from the client to the server it performs 30-40Mbyte/s.
  • Now, if I copy/paste the same 1.5Gbyte file in the same directory, but from the client, and I do in on a network disk (net use Z: \\192.168.2.1\data), I reach a performance of only 12-14Mbyte/s. Not nice. :dead:

Server and client are crosscabled using a Cat5e crosscable (properly wired for gigabit). Server uses a Supermicro PDSME motherboard, onboard Intel(R) PRO/1000 PM, 1Gbyte memory, Areca 1120 array controller in raid5 using four seagate barracuda 250Gb disks @ 7200. The client is not new, but holds a 2.8Ghz intel cpu, a Intel(R) PRO/1000 GT network card in 32bit PCI. For now, both are running in a testsetup with 2003 server, no firewalls, no other network connections, no antivirus. The netcars report they are succesfull at negotiating 1Gb FD. CPU loads are 10% @ server and 35% @ client.

I've tried putting the cards in forced 1Gb, making the server a DC and the client a member of the domain (to prevent credential conflicts), changing the TcpWindowSize on both sides (additionally also changing the Tcp1323Opts to 0 to allow manual setting of TcpWindowSize). None of these attempts had any effect at all. When I installed IPX (protocol-check) the performance was slightly slower. Netbeui boosted the performance of my network based copy/paste to 14-16Mbyte/s which still is very slow. I've read some threads about similar slow 100Mbit FD performance and setting it to 100Mbit HD solved it, but my drivers don't allow 1000Mbit HD.

The only out-of-the-ordinary I have noticed is that during my copy/paste filetransfer, on the client I notice about 250 pagereads/s and 0 pagewrites/s. I would assume an equal number, either both 0 or both 250 when windows uses paging to get the file contents in memory and back to the network drive. May this be the source of the problem?

Can anyone say:
  • Why my filesharing throughput is so much slower than when using other kinds of file transfers while there seems to be no bottleneck to account for the decreased performance? I agree to some overhead but this seems to be too much (applies to using available bandwidth).
  • If I can verify the actual tcpwindowsize in use by the file sharing tcp connections? (similar)
  • How I can further boost the gigabit link in general? I know the quality of the cable is important and iPerf currently reports somewhere at 280Mbit. The drivers report a 75% link quality, whatever cable I use. Would a cat6, well made, double-shielded, super-twisted, laser protected, environmentally friendly and politically correct cable do the trick :D ? (applies to reaching a higher bandwidth)

BTW: I do realize that presently my server drives at simultaneous 40Mbyte read/write are a bottleneck, then the clients PCI bus at 132Mbyte half duplex may be the bottleneck too, but not yet at 40Mbyte/s transfers. I intent to scale this server with more and faster disks throughout the next few years while I grow my client with it.
 
Hi Samstoned,

Thanks for your reply. I just completed reading the article you recommended.

What I read is that it's about fiber channel SAN's. I built those in the past professionally (very interesting subject :rolleyes: ) but it's not what I have right here :eek: . I just have a regular server with a local disk (SATA Raid5) and a gigabit ethernet crosscable to my client, so I don't see how the article applies (even though it's interesting to read how ineffective iSCSI is compared to FCP).

This just in from a Dutch forum over here: turning off SMB signing (I didn't even know it existed) just increased bandwidth utilization by 6Mbyte/s full duplex :) :) . Reason why I didn't see bottlenecks is because redirector and smb client waitstate eachother when signing is on :dead: .

I'm now up to 18Mbyte/s full duplex but still a long way from 30-35Mbyte/s. Does anyone know other improvements or can anyone tell me while changing the tcpwindowsize has no effect whatsoever? :(
 
Your problems are obviously above the transport layer since other applications using TCP/IP perform well enough. You have to look into what was different when you used the network drive and when you used the network share.

Maybe try some alternative file copy programs in case the Windows shell is retarded? Try TotalCopy for example?

PS
A good idea is to enable jumbo frames in an all-gigabit environment.
 
nosuna said:
The only out-of-the-ordinary I have noticed is that during my copy/paste filetransfer, on the client I notice about 250 pagereads/s and 0 pagewrites/s. I would assume an equal number, either both 0 or both 250 when windows uses paging to get the file contents in memory and back to the network drive. May this be the source of the problem?
How much ram is in the client? Page-in without any Page-out implies page stealing of memory from an idle program or pages which were not modified (ie: reentrant code). The extra i/o on the origin node would impact the observed thruput, somewhat.

A good background article on file copy discusses the window size

The TCP SACK option may be useful to you (rfc 2018). A shareware program addresses setting the SACK option, but I'm sure it's just another registry hack.

===
Control Selective Acknowledgement (SACK) Operation (Windows NT/2000/XP)
This parameter controls whether or not Selective ACK (SACK - RFC 2018) support is enabled. With SACK enabled (default), a packet or series of packets can be dropped, and the receiver informs the sender which data has been received, and where there may be "holes" in the data.

Download this tweak with Tweak Manager! This tweak can be easily applied using WinGuides Tweak Manager.
Download a free trial now!

Open your registry and find or create the key below.

Create a new DWORD value, or modify the existing value, called "SackOpts" and set it according to the value data below.

Exit your registry, you may need to restart or log out of Windows for the change to take effect.

Registry Editor Example
| Name Type Data |
| (Default) REG_SZ (value not set) |
| SackOpts REG_DWORD 0x00000001 (1) |
-
| HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\
Parameters |
-

Registry Settings
System Key: [HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters]
Value Name: SackOpts
Data Type: REG_DWORD (DWORD Value)
Value Data: (0 = disabled, 1 = enabled)
===

Also MS kb328890 modifies Delay Ack value (rfc 1122).

this article make the point
Operating Systems currently support RFC 1323 Extensions
for Window Scaling
• Example: Microsoft Windows provides registry parameters
• TcpWindowSize, GlobalMaxTcpWindowSize
• Tcp1323Opts This
parameter controls RFC 1323 time stamps and windowscaling options.
0 (disable RFC 1323 options)
1 (window scale enabled only)
2 (timestamps enabled only)
3 (both options enabled)​
 
Hi Jobeard, thanks for your reply.

jobeard said:
How much ram is in the client?

Client:
Physical memory Total 523764, Available 152440, System Cache 228468
PF Usage: 381 MB
Kernel Memory Total 73956, Paged 34032, Nonpaged 39908

The server has 1Gbyte memory. The above mentioned figures are before, during and after the filecopy. The filecopy does not impact the figures. Unless explained otherwise, I don't see a bottleneck in these figures but perhaps you see something I don't. I'm much interested in your view on this subject. Is there any specific measurement I can make to support the analysis?

jobeard said:
A good background article on file copy discusses the window size
I read it, including the other URLs you gave. I changed the Delay ACK to 8 which gave some improvement (more on that later). When I set it to 255 (maximum) the system became unresponsive over IP (e.g. RDP was unable to connect). Have not yet tested SACK yet, but will do so tomorrow.

Unfortunately my server board does not support jumbo frames (yet waiting for a delivery), so I've been unable to test that but from the SACK and Delay ACK articles it's clear that limiting the amount of ACK frames can greatly improve performance. What I don't understand is how this affects a full duplex environment. My understanding is that full duplex is collision free as different wires are used by each sender. Is that correct?

Here's an image of what I observed initially. On the left it shows a server to client copy of a large file initiated on the client using windows file sharing client. On the right it shows a server to server copy of the same file initated on the client using windows file sharing (default windows settings with SMB signing disabled) Clearly there is a significant reduction in performance. Please note that if a server-server copy is initiated on the server (from a local workstation logon) a sustained full duplex read/write performance of 40Mbytes/s is achieved on the disk array.

(click link for image)

Right SMB signing is disabled, no jumbo frames, Delay ACK=8, TcpWindowSize=2Mbyte, TCP1323Opts=default, the nic's send/receive buffers are at 2048, adaptive interframe spacing=enabled. For the result, observe the following image (filecopy initiates on the right side of the graph)

(click link for image)

It's clear that during the first 30 seconds or so there is a sustained througput of around 28Mbytes in both directions during a server to server copy initiated by the client. However, after some time the througput fails and begins to fluctuate highly. My first guess would be that either server or client (presumably client) is unable to keep up with the dataflow (flow control=on), not so much from a flow-control perspective, but perhaps more in the area of local paging or bus conflicts.

Can you, or anyone else, comment on:
- The details on the paging matter you refer to.
- Possible reasons why the current sustained througput of 28Mb/s full duplex fails after some period of time? Is this related?

Any comments or references to external sites are much appreciated.
 
nosuna said:
Can you, or anyone else, comment on:
- The details on the paging matter you refer to.
Sole point being client activity will degrade maximum transfer rate. Big question is WHY the paging activity in a controlled test?

nosuna said:
I read it, including the other URLs you gave. I changed the Delay ACK to 8 which gave some improvement (more on that later). When I set it to 255 (maximum) the system became unresponsive over IP (e.g. RDP was unable to connect). Have not yet tested SACK yet, but will do so tomorrow.
Even on my laptop I can see the effect of SACK w/o the Delay Ack change.
Suggest: tweek only ONE side of the test to achieve MAX thruput, then duplicate
the settings into the client as well; should make a real impact to the outcome. PS: don't go crazy re 8 vs. 255. TCP has a history of a shapr knee or avalanche condition as parms change.

fyi: try this test from a system to a near by site, on BOTH client and server.
This may lead to an effect that would not be obvious, like the bottleneck is not where you expect it.
nosuna said:
Unfortunately my server board does not support jumbo frames (yet waiting for a delivery), so I've been unable to test that but from the SACK and Delay ACK articles it's clear that limiting the amount of ACK frames can greatly improve performance. What I don't understand is how this affects a full duplex environment. My understanding is that full duplex is collision free as different wires are used by each sender. Is that correct?
With in reason, yes. Regardless, it's still a contention network and only 70% efficient. There will be dropped packets and SACK + Delay Ack will make a difference.
nosuna said:
Here's an image of what I observed initially. On the left it shows a server to client copy of a large file initiated on the client using windows file sharing client. On the right it shows a server to server copy of the same file initated on the client using windows file sharing (default windows settings with SMB signing disabled) Clearly there is a significant reduction in performance. Please note that if a server-server copy is initiated on the server (from a local workstation logon) a sustained full duplex read/write performance of 40Mbytes/s is achieved on the disk array.
So some component makes this the upper bound of performance on THIS system. (suggest again the same test on the client). The issue becomes the degradation from 40 to 28mb when there are two systems AND a network connection.
nosuna said:
It's clear that during the first 30 seconds or so there is a sustained througput of around 28Mbytes in both directions during a server to server copy initiated by the client. However, after some time the througput fails and begins to fluctuate highly. My first guess would be that either server or client (presumably client) is unable to keep up with the dataflow (flow control=on), not so much from a flow-control perspective, but perhaps more in the area of local paging or bus contention.
There is no flow control per se in ethernet (that's an RS232 serial term), but you've got the right ideas.

While you're in the mood to experiment, have you tried reverting to half-duplex? It's a major assumption that full duplex should perform better, put not proven :) With only one NIC and route between both systems, the sum of net i/o will get into collision quickly. By going half-duplex for such massive transfers, I would think TCP smart enough to do better. At least giving the client the opportunity to reply and dispose of the data while doing so. This should become more obvious when you get to bigger WINDOWsize values.
nosuna said:
Can you, or anyone else, comment on:
- Possible reasons why the current sustained througput of 28Mb/s full duplex fails after some period of time? Is this related?
 
Hi Jobeard,

Thanks for yet another suggestion and your pm also. :)

I intented to test them this weekend and post the results here. However, an unexpected extra performance (I also play in improvisational comedy theater :bounce: ) took away those pieces of time I had reserved for it.

I'll test the suggestions in the next few days and post the results after all.

Thanks so much for all the help.
 
Status
Not open for further replies.
Back