Skip to content

Consider revising behavior of SftpFileStream #194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
drieseng opened this issue Mar 9, 2017 · 12 comments
Closed

Consider revising behavior of SftpFileStream #194

drieseng opened this issue Mar 9, 2017 · 12 comments
Assignees

Comments

@drieseng
Copy link
Member

drieseng commented Mar 9, 2017

Our SftpFileStream currently performs separate householding for read and write mode.
This means - for example - that if you:

  1. open an existing file for read/write
  2. read two bytes
  3. write a single byte

The byte that you wrote in step 3 is actually written at position 0 instead of 2.

The following code fragment exposes this:

using (var client = new SftpClient(...))
{
    client.Connect();

    using (var s = client.Open(path, FileMode.CreateNew, FileAccess.Write))
    {
        s.Write(new byte[] { 5, 4, 3, 2, 1 }, 1, 3);
    }

    using (var s = client.Open(path, FileMode.Open, FileAccess.ReadWrite))
    {
        Console.WriteLine("#1: " + s.ReadByte());
        Console.WriteLine("#2: " + s.ReadByte());
        Console.WriteLine("#3: " + s.Position);

        s.WriteByte(7);
        s.Write(new byte[] {8, 9, 10, 11, 12}, 1, 3);

        Console.WriteLine("#4: " + s.Position);
    }
}
@drieseng drieseng self-assigned this Mar 9, 2017
olegkap added a commit that referenced this issue Mar 11, 2017
@drieseng
Copy link
Member Author

As I mentioned offline, I think you're moving a little too fast here by rewriting SftpFileStream from scratch. Rewriting for performance without affecting behavior is of course ok, but you're definitely changing the way SftpFileStream works for a client.

From a single(-threaded) consumer point-of-view, the current behavior does not make any sense.

From a multi-threaded point-of-view, it makes sense that advancing the position in thread A by reading should not affect the position of thread B that is writing. The original implementation was clearly ment to be thread-safe.

Before we rewrite SftpFileStream we should first discuss how we expect it to work (eg. do we want it to be thread-safe), and perhaps even involve our user base.

@olegkap
Copy link
Contributor

olegkap commented Mar 11, 2017

Well, the idea behind SftFileStream was to have same behaviour as FileStream, and for that I used Microsoft implementation of it here as a starting point (in the previous implementation I was using Mono as reference I think), and replace native functionality to access files with SSH equivalent functions.

I did leave some parts out of it, the one that has to do with async, at least for now we can restore it later.

So I guess the question is how SftpFileStream behavior now is different from FileStream and if should it be.

As far as I know FileStream is not a thread safe but perhaps we can make it different and restore locking to ensure thread safe as it was before, I just didnt want to restore it just now as it was my first attempt to rewrite it and I was worried about race conditions and locking myself out but I can take a look at it later.

I know its still need to be corrected so it will pass tests and compile for .NET Core and perhaps other environments but I guess we can address it later once we finalize behaviour.

@drieseng
Copy link
Member Author

For the next - and long overdue - release, I think we should stick with the previous behaviour.
Perhaps we could add a section to the release notes for that release to announce the change in behavior, and sollicit feedback.

What do you think?

@olegkap
Copy link
Contributor

olegkap commented Mar 13, 2017

Well, we can stick to previous behavior then, no problem.

I guess my question is how previous behavior is different from current one, and is previous behavior is correct or perhaps it needed to be changed of fixed.

@drieseng
Copy link
Member Author

Personally, I consider the current behavior broken.
If, however, a large part of SSH.NET users rely on this broken behavior then we cannot just fix the implementation.

Since you referenced this issue in your commit, I assume you somehow modified the implementation.
If you didn't modify anything, then you wouldn't have called it a complete rewrite :p

@olegkap
Copy link
Contributor

olegkap commented Mar 13, 2017

Well, the reason I called it a complete rewrite is because I used Microsoft code as point of reference instead of Mono one. So it was perhaps confusing, sorry about that. So as far as I know it fixes the issue that you mentioned while keeps Stream behavior same as Microsoft FileStream one so it should be any breaking changes or changes in behavior.

@drieseng
Copy link
Member Author

Not sure if I understood you correctly. Are you saying that there should not be any breaking changes?
If you fixed the "issue" that I mentioned, then one could consider it a breaking change.

Again: I don't like the current behavior, but changing it may break existing applications.
For the current release cycle (2016.1.0), I think we should stick with the previous behaviour and add a section to the release notes to announce the change in behavior, and sollicit feedback.

@olegkap
Copy link
Contributor

olegkap commented Mar 14, 2017

What I mean by no breaking changes is that all methods still the same and have the same signature and same behavior.

If you like, sure, we can keep previous version and revert it back so new version could be released, but then the issue that you mentioned will not be fixed.

I guess we can let people know in release notes that we have improved SftFileStream that will fix that particular issue and perhaps others and will be designed to work the same as FileStream.

@drieseng
Copy link
Member Author

The goal of this issue was to consider/discuss changing the behavior of SftpFileStream.
I'll see if I find enough time to add tests for the new implementation, and prepare a section in the release notes. If not, I propose we postpone the rewrite until after the next release.

@olegkap
Copy link
Contributor

olegkap commented Mar 14, 2017

Sure, no problem.
Sounds good.

@drieseng
Copy link
Member Author

drieseng commented Mar 18, 2017

The revision you committed breaks some test I have locally, but I do not consider it a breaking change. However, I still prefer to stick with the previous version from now.

I'd like to discuss some other things:

Read Buffer

Assume you've just opened a file, and invoked SftpFileStream.Read(byte[] buffer, int offset, int length) with a length that is less than the read buffer size. We then of course perform a SSH_FXP_READ with the read buffer size as length.

Scenario 1

The number of bytes we received from the remote server is less than or equal to the length requested by the client.

Do we:

  1. write directly to the user-provided buffer, and not write to our read buffer at all?
  2. write all bytes we obtained to the read buffer, and then copy the requested length from the read buffer to the user-provided buffer?

Answer:
We now do 1.

and do we:

  1. keep requesting data from the server in loop until we either received length number of bytes or the server signals EOF?
  2. immediately return the bytes we received, and leave it up to the user to invoke SftpFileStream.Read(byte[] buffer, int offset, int length) again?

Answer:
We now do 1, but also interrupt the read loop when the server returns less bytes than we requested (=read buffer size) as that may indicate that we've reached EOF.

Scenario 2

The number of bytes we received from the remote server is greater than than the length requested by the client.

Do we:

  1. write the requested length directly to the user-provided buffer, and the remaining bytes to our read buffer?
  2. write all bytes we obtained to the read buffer, and then copy the requested length from the read buffer to the user-provided buffer?

Remarks

For both scenarios above, the first choice avoids allocation of a read buffer and/or - fully or partly - eliminates copy from read buffer to user-provided buffer.

The price you pay for that choice is that moving around and reading again from the same area of the file will cause an exra round trip.

If the primary usage of the SftpFileStream is forward-only reading, then the second choice is obviously a little more efficient hereby offering a small performance gain for both scenarios.

Note:
Unless someone specified a very small buffer size, or a very large length as argument for the SftpFileStream.Read(byte[] buffer, int offset, int length) invocation(s), scenario 1 is likely to only happen for:

  • the last chunk of a given file, in which case the read buffer was probably already allocated by reading any of the previous chunks
  • files that are smaller than the value specified for the length argument

Thread Safety

T.B.D.

@drieseng
Copy link
Member Author

drieseng commented Jul 21, 2017

I've implemented the following changes:

  • We now lazily initialize the read and write buffer (for now, we keep these separate).
  • Read(byte[] buffer, int offset, int count) now only writes those bytes - that we received from the server - to the buffer that exceed the number of bytes requested by the caller.
  • The read loop in Read(byte[] buffer, int offset, int count) is interrupted when any of the following conditions are met:
    • we've written the number of bytes that the caller requested
    • the server returned zero bytes, hereby signaling EOF
    • the server returned less bytes than we requested (=read buffer size), as that may indicate that we've reached EOF

@drieseng drieseng added this to the 2016-1.0-beta2 milestone Jul 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants