nas-data-migration/NAS_data_migration.md

# The Story

I was trying to migrate my data from one WD MyCloud EX 2 Ultra to another. This is my story of frustration, unanswered questions and a journey through bash + ssh.

## TL;DR

### Results

As of now, transfer speeds are consistently around 20-25 MB/s with the bottleneck being target drive write speed. Looking for possible causes.

### Takeaways

* To check for network performance, start with `iperf` for throughput, then `ethtool` for hardware layer and then proceed with tuning if necessary
* To check drive performance: `hdparm` and `dd` for read and write speeds, before checking transfer tools
* In this case, there is no performance difference between `scp` and `rsync`, athough if I wanted to resume broken copy process, rsync provides option to skip existing files (as a workaround in scp we can force this by removing write persmissions to already existing files)

## Initial conditions

| Parameter | Source | Target |
|-|-|-|
| Device | WD MyCloud EX2 Ultra | WD MyCloud EX2 Ultra |
| Firmware | 2.31.204 | 2.31.204 |
| Drives | 2 x 3 TB WDC WD30EFRX-68N32N0, FwRev=82.00A82 | 2 x 4 TB WDC WD40EFRX-68N32N0, FwRev=82.00A82 |
| Raid | Raid-1 | Raid-0 |
| Encrypted | No | Yes |
| IP address | 192.168.1.54 | 192.168.1.53 |

Connected using Cat 5E cables via TP-Link TL-SG108 Gigabit switch.

## Testing

First, I enabled ssh access in both NASes via web-ui:

```Settings > Network > SSH```

Default user for WD Mycloud EX2 Ultra is `sshd` so to connect to NAS I used command:

```ssh sshd@192.168.1.53```

To avoid multiple data transfers I logged into target ssh and tried to copy from source using `scp` command:

```
scp -rp sshd@192.168.1.54:/mnt/HD/HD_a2/Marcin /mnt/HD/HD_a2/
```

However, the transfer speed oscillated between 20-25 MB/s. This is way below expected 70 MB/s.

First I checked if network cards indeed connected using 1000 Mbps full-duplex using `ethtool`:

```
# ethtool egiga0
Settings for egiga0:
	Supported ports: [ TP MII ]
	Supported link modes:   10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	                        1000baseT/Full
	Supported pause frame use: No
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	                        1000baseT/Half 1000baseT/Full
	Advertised pause frame use: No
	Advertised auto-negotiation: No
	Link partner advertised link modes:  10baseT/Half 10baseT/Full
	                                     100baseT/Half 100baseT/Full
	                                     1000baseT/Full
	Link partner advertised pause frame use: No
	Link partner advertised auto-negotiation: Yes
	Speed: 1000Mb/s
	Duplex: Full
	Port: MII
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Link detected: yes
```

And that the IO and CPU are not saturated (those were already visible with web-ui, but I used `iostat`):

```
# iostat
Linux 3.10.39 (KlinkierChmurka) 	05/05/20 	_armv7l_	(2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          29.23    6.16   28.23    1.36    0.00   35.02
```

```
# iostat -dx /dev/sda 5
Linux 3.10.39 (KlinkierChmurka) 	05/05/20 	_armv7l_	(2 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.55     5.39    4.01   95.83   212.68 10755.40   219.70     0.45    4.53    4.82    4.52   1.19  11.90
```

I then checked if they are properly routing through the switch with `traceroute`:

```
# traceroute 192.168.1.54
traceroute to 192.168.1.54 (192.168.1.54), 30 hops max, 38 byte packets
 1  192.168.1.54 (192.168.1.54)  0.428 ms  0.428 ms  0.391 ms
 ```

Then I enabled jumbo frames in the web-ui and verified they are working using `ping`:

```
# ping -s 8972 192.168.1.54
PING 192.168.1.54 (192.168.1.54): 8972 data bytes
8980 bytes from 192.168.1.54: seq=0 ttl=64 time=0.898 ms
8980 bytes from 192.168.1.54: seq=1 ttl=64 time=2.904 ms
8980 bytes from 192.168.1.54: seq=2 ttl=64 time=0.786 ms
```

To do performance test I stopped the scp pressing `Ctrl-Z` (that would allow me to resume it later using `bg` of `fg`) and ran a network throughput test using `iperf`:

Source:
```
# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.1.53 port 5001 connected with 192.168.1.54 port 41261
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  1.16 GBytes   991 Mbits/sec
```

Target:
```
# iperf -c 192.168.1.53
------------------------------------------------------------
Client connecting to 192.168.1.53, TCP port 5001
TCP window size: 93.3 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.54 port 41261 connected with 192.168.1.53 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.16 GBytes   992 Mbits/sec
```

This would rule out the network bottleneck. I decided to perform test on one large file using both `scp` and `rsync`, which can sometimes outperform `scp`.

```
# cd /mnt/HD/HD_a2/Marcin/
# dd if=/dev/zero of=1GB_TEST_FILE bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.0GB) copied, 37.631790 seconds, 27.2MB/s
```

Starting with `scp`. Speed is consistent with what I obserwe with real data:
```
# scp -rp sshd@192.168.1.54:/mnt/HD/HD_a2/Marcin/1GB_TEST_FILE /mnt/HD/HD_a2/Marcin/1GB_TEST_FILE
sshd@192.168.1.54's password:
1GB_TEST_FILE                 100% 1024MB  22.2MB/s   00:46
```

Now `rsync`. Unfortunately, no improvement there:

```
# rsync -a sshd@192.168.1.54:/mnt/HD/HD_a2/Marcin/1GB_TEST_FILE /mnt/HD/HD_a2/Marcin/1GB_TEST_FILE --progress
sshd@192.168.1.54's password:
receiving incremental file list
1GB_TEST_FILE
  1073741824 100%   21.57MB/s    0:00:47 (xfer#1, to-check=0/1)

sent 30 bytes  received 1073872980 bytes  20851903.11 bytes/sec
total size is 1073741824  speedup is 1.00
```

But wait, didn't the `dd` created the file on source NAS with speed around 27 MB/s? Time to bench the drives.

Source read speed:

```
# hdparm -t /dev/sda

/dev/sda:
Timing buffered disk reads: 538 MB in  3.01 seconds = 178.73 MB/sec
```

Target write speed:

```
# dd if=/dev/zero of=1GB_TEST_FILE bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.0GB) copied, 41.367580 seconds, 24.8MB/s
```

Ok, so it would seem drive writing speed would be at fault. But why? Maybe indexing is at fault. There is a known issue with indexing daemons [1] to break on certain files causing infinite indexing and bringing NAS performance to its knees.

Let's stop them for the time being.

```
/etc/init.d/wdmcserverd stop
/etc/init.d/wdphotodbmergerd stop
```

To do this permamently, following command might be used:

```
update-rc.d wdphotodbmergerd disable
update-rc.d wdmcserverd disable
```

But to no avail. Problem remains.

So there we are. SMB transfers to the drive are faster, 50-70 MB/s, close to advertised. And it would seem that I am not the only one with such problem (SCP slow, SMB fast):

* https://forums.freebsd.org/threads/slow-nfs-smb-afp-but-fast-scp-read-performance.68077/#post-410296

However, the benchmarks around web seem to find scp an rsync much faster than SMB:

* https://squarism.com/2010/02/12/scp-vs-rsync-vs-smb-vs-ftp/

Having depleted my theories I decided to ask WD community:

https://community.wd.com/t/wd-mycloud-ex2-ultra-2x4tb-slow-write-speeds-over-ssh/250988