nas-data-migration/NAS_data_migration.md

6.9 KiB

The Story

I was trying to migrate my data from one WD MyCloud EX 2 Ultra to another. This is my story of frustration, unanswered questions and a journey through bash + ssh.

TL;DR

Results

As of now, transfer speeds are consistently around 20-25 MB/s with the bottleneck being target drive write speed. Looking for possible causes.

Takeaways

  • To check for network performance, start with iperf for throughput, then ethtool for hardware layer and then proceed with tuning if necessary
  • To check drive performance: hdparm and dd for read and write speeds, before checking transfer tools
  • In this case, there is no performance difference between scp and rsync, athough if I wanted to resume broken copy process, rsync provides option to skip existing files (as a workaround in scp we can force this by removing write persmissions to already existing files)

Initial conditions

Parameter Source Target
Device WD MyCloud EX2 Ultra WD MyCloud EX2 Ultra
Firmware 2.31.204 2.31.204
Drives 2 x 3 TB WDC WD30EFRX-68N32N0, FwRev=82.00A82 2 x 4 TB WDC WD40EFRX-68N32N0, FwRev=82.00A82
Raid Raid-1 Raid-0
Encrypted No Yes
IP address 192.168.1.54 192.168.1.53

Connected using Cat 5E cables via TP-Link TL-SG108 Gigabit switch.

Testing

First, I enabled ssh access in both NASes via web-ui:

Settings > Network > SSH

Default user for WD Mycloud EX2 Ultra is sshd so to connect to NAS I used command:

ssh sshd@192.168.1.53

To avoid multiple data transfers I logged into target ssh and tried to copy from source using scp command:

scp -rp sshd@192.168.1.54:/mnt/HD/HD_a2/Marcin /mnt/HD/HD_a2/

However, the transfer speed oscillated between 20-25 MB/s. This is way below expected 70 MB/s.

First I checked if network cards indeed connected using 1000 Mbps full-duplex using ethtool:

# ethtool egiga0
Settings for egiga0:
	Supported ports: [ TP MII ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Full 
	Supported pause frame use: No
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Advertised pause frame use: No
	Advertised auto-negotiation: No
	Link partner advertised link modes:  10baseT/Half 10baseT/Full 
	                                     100baseT/Half 100baseT/Full 
	                                     1000baseT/Full 
	Link partner advertised pause frame use: No
	Link partner advertised auto-negotiation: Yes
	Speed: 1000Mb/s
	Duplex: Full
	Port: MII
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Link detected: yes

And that the IO and CPU are not saturated (those were already visible with web-ui, but I used iostat):

# iostat
Linux 3.10.39 (KlinkierChmurka) 	05/05/20 	_armv7l_	(2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          29.23    6.16   28.23    1.36    0.00   35.02
# iostat -dx /dev/sda 5
Linux 3.10.39 (KlinkierChmurka) 	05/05/20 	_armv7l_	(2 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.55     5.39    4.01   95.83   212.68 10755.40   219.70     0.45    4.53    4.82    4.52   1.19  11.90

I then checked if they are properly routing through the switch with traceroute:

# traceroute 192.168.1.54
traceroute to 192.168.1.54 (192.168.1.54), 30 hops max, 38 byte packets
 1  192.168.1.54 (192.168.1.54)  0.428 ms  0.428 ms  0.391 ms

Then I enabled jumbo frames in the web-ui and verified they are working using ping:

# ping -s 8972 192.168.1.54
PING 192.168.1.54 (192.168.1.54): 8972 data bytes
8980 bytes from 192.168.1.54: seq=0 ttl=64 time=0.898 ms
8980 bytes from 192.168.1.54: seq=1 ttl=64 time=2.904 ms
8980 bytes from 192.168.1.54: seq=2 ttl=64 time=0.786 ms

To do performance test I stopped the scp pressing Ctrl-Z (that would allow me to resume it later using bg of fg) and ran a network throughput test using iperf:

Source:

# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.1.53 port 5001 connected with 192.168.1.54 port 41261
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  1.16 GBytes   991 Mbits/sec

Target:

# iperf -c 192.168.1.53
------------------------------------------------------------
Client connecting to 192.168.1.53, TCP port 5001
TCP window size: 93.3 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.54 port 41261 connected with 192.168.1.53 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.16 GBytes   992 Mbits/sec

This would rule out the network bottleneck. I decided to perform test on one large file using both scp and rsync, which can sometimes outperform scp.

# cd /mnt/HD/HD_a2/Marcin/
# dd if=/dev/zero of=1GB_TEST_FILE bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.0GB) copied, 37.631790 seconds, 27.2MB/s

Starting with scp. Speed is consistent with what I obserwe with real data:

# scp -rp sshd@192.168.1.54:/mnt/HD/HD_a2/Marcin/1GB_TEST_FILE /mnt/HD/HD_a2/Marcin/1GB_TEST_FILE
sshd@192.168.1.54's password: 
1GB_TEST_FILE                 100% 1024MB  22.2MB/s   00:46 

Now rsync. Unfortunately, no improvement there:

# rsync -a sshd@192.168.1.54:/mnt/HD/HD_a2/Marcin/1GB_TEST_FILE /mnt/HD/HD_a2/Marcin/1GB_TEST_FILE --progress
sshd@192.168.1.54's password: 
receiving incremental file list
1GB_TEST_FILE
  1073741824 100%   21.57MB/s    0:00:47 (xfer#1, to-check=0/1)

sent 30 bytes  received 1073872980 bytes  20851903.11 bytes/sec
total size is 1073741824  speedup is 1.00

But wait, didn't the dd created the file on source NAS with speed around 27 MB/s? Time to bench the drives.

Source read speed:

# hdparm -t /dev/sda

/dev/sda:
Timing buffered disk reads: 538 MB in  3.01 seconds = 178.73 MB/sec

Target write speed:

# dd if=/dev/zero of=1GB_TEST_FILE bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.0GB) copied, 41.367580 seconds, 24.8MB/s

Ok, so it would seem drive writing speed would be at fault. But why? SMB transfers to the drive are faster, 50-70 MB/s, close to advertised. And it would seem that I am not the only one with such problem (SCP slow, SMB fast):

However, the benchmarks around web seem to find scp an rsync much faster than SMB:

Having depleted my theories I decided to ask WD community:

https://community.wd.com/t/wd-mycloud-ex2-ultra-2x4tb-slow-write-speeds-over-ssh/250988