-
Notifications
You must be signed in to change notification settings - Fork 65
Truncate unnecessary data before specified offset #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When we read big file, the actual readings are separated to several blocks. In such situations, a node is requested to read at specific offset, but CRFS doesn't truncate unnecessary data before the offset. As a result, big files are broken in CRFS. This commit solve this issue by truncating unnecessary data before specified offset when CRFS fetch first chunk of required range. Signed-off-by: Kohei Tokunaga <[email protected]>
crfs.go
Outdated
@@ -1060,6 +1060,10 @@ func (h *nodeHandle) Read(ctx context.Context, req *fuse.ReadRequest, resp *fuse | |||
if err != nil { | |||
return err | |||
} | |||
if nr == 0 { | |||
// Truncate unncessary data | |||
chunkData = chunkData[req.Offset:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right in general.
This is in a loop (line 1050) is will keep filling, but req.Offset should only be used for the first chunk. And even then, only the part between ce.ChunkOffset and req.Offset should probably be used.
If they wanted overall req.Offset 100 but the 2nd chunk of the file is at offset 90, then what you really want here is offset 10. You don't just want 100 each time.
This could probably use some tests, like the ones in the stargz package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review. Right. I fixed offset calculation and added some checks in 15a2736.
Signed-off-by: Kohei Tokunaga <[email protected]>
@bradfitz Could I get any comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This modifies two packages and adds no new tests in either.
I'd feel much more confident in this if it had tests.
Signed-off-by: Kohei Tokunaga <[email protected]>
@bradfitz Could I get any comments? |
Thanks for adding tests. I'll take a look. |
stargz/stargz_test.go
Outdated
@@ -488,3 +488,72 @@ func symlink(name, target string) tarEntry { | |||
}) | |||
}) | |||
} | |||
|
|||
const ( | |||
chunkSize int64 = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove int64
crfs_test.go
Outdated
) | ||
|
||
const ( | ||
chunkSize int64 = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove int64 & string types
stargz/testutil.go
Outdated
|
||
package stargz | ||
|
||
// Makes minimal Reader of "reg" and "chunk" without tar-related information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document this the normal Go way:
https://github.com/golang/go/wiki/CodeReviewComments#comment-sentences
But also, let's rename it to not include "Stub".
But why is it exported at all? Can't you just put this in the test package where it's needed? Then naming & docs aren't quite as important. But once it's public API, the quality bar is much higher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review. I came up with another implementation not to export the function and fixed it.
|
||
if !bytes.Equal(wantData, resp.Data) { | ||
t.Errorf("off=%d; read data = (size=%d,data=%s); want (size=%d,data=%s)", | ||
offset, len(resp.Data), string(resp.Data), wantN, string(wantData)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in Go tests, when a string or []byte is known bad in a failing test, format it with %q instead of %s. %q nicely shows emptiness, trailing/leading whitespace, binary chars, etc.
And you don't need the string(...) conversion either.
crfs_test.go
Outdated
for in, innero := range innerOffsetCond { | ||
for bo, baseo := range baseOffsetCond { | ||
for fn, filesize := range fileSizeCond { | ||
t.Run(strings.Join([]string{"reading", sn, in, bo, fn}, "_"), func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or fmt.Sprintf(...) might be easier to read? Your call.
and several fixes related with coding conventions. Signed-off-by: Kohei Tokunaga <[email protected]>
Thank you for your review. I'm ready for a new review. |
crfs.go
Outdated
off = size | ||
} | ||
chunkData = chunkData[off:] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this all just be:
n := copy(resp.Data[nr:], chunkData[offset+int64(nr)-ce.ChunkOffset:])
With that change, all your tests still pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your idea. I think it's better too. I fixed it.
Signed-off-by: Kohei Tokunaga <[email protected]>
Fixes: #17
Sometimes, big files are broken in CRFS.
When we read a big file, the actual readings are separated to several blocks. In
such situations, a node is requested to read at specific offset, but CRFS
doesn't truncate unnecessary data before the offset.
This commit solve this issue by truncating unnecessary data before specified
offset when CRFS fetch first chunk of required range.