@robi - I wonder if it still does any buffering when writing to /dev/null ? Since that's what the dd
read command above does. In any case, this suggestion caused me to revisit the direct_io
option. It says it disables the kernel paging cache, which does seem to give the most consistent performance improvement.
Yet Another Data Point - I did a lot more testing today, and I think I'm as far as I can go. The good news: I can consistently get 16 - 25 MB/s read speeds.
TL;DR: using this command gives me the best read performance (2x-3x improvement): nice -n -10 sshfs -s -o direct_io,compression=no
Why I'm using these options:
direct_io
direct_io
disables caching, and had quite an interesting effect on reads.
Using the -f -d
options I was able to watch the packets going through. I was wrong before about the writes being bigger than the reads; they're not. But the writes are being done more parallel than the reads.
Before direct_io:
[01315] READ
[01308] DATA 32781bytes (31ms)
[01309] DATA 32781bytes (31ms)
[01310] DATA 32781bytes (31ms)
[01311] DATA 32781bytes (31ms)
[01316] READ
[01317] READ
[01318] READ
[01319] READ
[01312] DATA 32781bytes (31ms)
[01313] DATA 32781bytes (31ms)
[01314] DATA 32781bytes (31ms)
[01315] DATA 32781bytes (31ms)
READ requests 4 chunks at a time, waits for them, and then requests 4 more.
[05895] WRITE
[05827] STATUS 28bytes (34ms)
[05828] STATUS 28bytes (34ms)
[05829] STATUS 28bytes (35ms)
[05830] STATUS 28bytes (35ms)
[05831] STATUS 28bytes (35ms)
[05832] STATUS 28bytes (34ms)
[05833] STATUS 28bytes (34ms)
[05834] STATUS 28bytes (34ms)
[05835] STATUS 28bytes (34ms)
[05896] WRITE
[05897] WRITE
WRITE requests at least 60 chunks at a time, and sometimes I saw over 100 chunks pending.
After turning on direct_io, the reads look more like the writes:
[06342] READ
[06343] READ
[06344] READ
[06313] DATA 32781bytes (31ms)
[06314] DATA 32781bytes (31ms)
[06315] DATA 32781bytes (31ms)
[06316] DATA 32781bytes (31ms)
[06317] DATA 32781bytes (32ms)
[06318] DATA 32781bytes (32ms)
[06319] DATA 32781bytes (32ms)
[06320] DATA 32781bytes (32ms)
[06321] DATA 32781bytes (33ms)
[06322] DATA 32781bytes (35ms)
[06323] DATA 32781bytes (35ms)
[06324] DATA 32781bytes (36ms)
[06325] DATA 32781bytes (36ms)
[06326] DATA 32781bytes (36ms)
[06327] DATA 32781bytes (37ms)
Note the difference in the chunk IDs and you can see it's allowing at most 31 chunks to be pending before requesting more.
I think this is the primary reason for the speed increase.
-s for single threading
I noticed that running it on a single thread caused the degredation of the repeated file reads to be less pronounced. Instead of dropping back to 8 MB/s after a few reads, it does 25 MB/s read at least 5-6 times (500-600 MB) before dropping down to 16 MB/s. Also, it recovers back to 25 MB/s over time, whereas with multi-threaded I needed to restart the SSHFS connection in order to get 25 MB/s speeds again.
nice
Since there seems to be an element of CPU bottleneck (as resolved by running in a single process) I also wanted to give this process priority. It seems to help the session get more 25 MB/s reads before slowing down
compression=no
Because we're now on one thread, and we're hogging lots of CPU time, I disabled compression. I didn't notice a difference in throughput with it on, but turning it off helps reduce CPU load
Next Steps:
I will run this test a few more times, and probably even adjust my mount for the volume manually to see if it helps performance.
There is definitely some element of throttling / filling up, because repeated reads in the same session can get slower, and starting a new session can help the speed go back up. I'm not sure if this is on the client side or the server side. Any insights would be greatly appreciated.
Even though I wish there was a clearer answer, I'll be happy if the 2x boost to read speed works.
P.S. - I even tried a "high performance SSH" binary hpnssh
, and it did not make a noticeable difference in my tests.