Skip to content

Truncate unnecessary data before specified offset #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 17, 2019

Conversation

ktock
Copy link
Contributor

@ktock ktock commented Sep 13, 2019

Fixes: #17
Sometimes, big files are broken in CRFS.

When we read a big file, the actual readings are separated to several blocks. In
such situations, a node is requested to read at specific offset, but CRFS
doesn't truncate unnecessary data before the offset.

This commit solve this issue by truncating unnecessary data before specified
offset when CRFS fetch first chunk of required range.

When we read big file, the actual readings are separated to several blocks. In
such situations, a node is requested to read at specific offset, but CRFS
doesn't truncate unnecessary data before the offset.

As a result, big files are broken in CRFS.

This commit solve this issue by truncating unnecessary data before specified
offset when CRFS fetch first chunk of required range.

Signed-off-by: Kohei Tokunaga <[email protected]>
@googlebot googlebot added the cla: yes Signed a CLA label Sep 13, 2019
crfs.go Outdated
@@ -1060,6 +1060,10 @@ func (h *nodeHandle) Read(ctx context.Context, req *fuse.ReadRequest, resp *fuse
if err != nil {
return err
}
if nr == 0 {
// Truncate unncessary data
chunkData = chunkData[req.Offset:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right in general.

This is in a loop (line 1050) is will keep filling, but req.Offset should only be used for the first chunk. And even then, only the part between ce.ChunkOffset and req.Offset should probably be used.

If they wanted overall req.Offset 100 but the 2nd chunk of the file is at offset 90, then what you really want here is offset 10. You don't just want 100 each time.

This could probably use some tests, like the ones in the stargz package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review. Right. I fixed offset calculation and added some checks in 15a2736.

@ktock
Copy link
Contributor Author

ktock commented Sep 19, 2019

@bradfitz Could I get any comments?

Copy link
Contributor

@bradfitz bradfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This modifies two packages and adds no new tests in either.

I'd feel much more confident in this if it had tests.

@ktock ktock requested a review from bradfitz September 29, 2019 06:25
@ktock
Copy link
Contributor Author

ktock commented Oct 6, 2019

@bradfitz Could I get any comments?

@bradfitz
Copy link
Contributor

Thanks for adding tests. I'll take a look.

@@ -488,3 +488,72 @@ func symlink(name, target string) tarEntry {
})
})
}

const (
chunkSize int64 = 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove int64

crfs_test.go Outdated
)

const (
chunkSize int64 = 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove int64 & string types

See https://blog.golang.org/constants


package stargz

// Makes minimal Reader of "reg" and "chunk" without tar-related information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document this the normal Go way:
https://github.com/golang/go/wiki/CodeReviewComments#comment-sentences

But also, let's rename it to not include "Stub".

But why is it exported at all? Can't you just put this in the test package where it's needed? Then naming & docs aren't quite as important. But once it's public API, the quality bar is much higher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review. I came up with another implementation not to export the function and fixed it.


if !bytes.Equal(wantData, resp.Data) {
t.Errorf("off=%d; read data = (size=%d,data=%s); want (size=%d,data=%s)",
offset, len(resp.Data), string(resp.Data), wantN, string(wantData))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in Go tests, when a string or []byte is known bad in a failing test, format it with %q instead of %s. %q nicely shows emptiness, trailing/leading whitespace, binary chars, etc.

And you don't need the string(...) conversion either.

crfs_test.go Outdated
for in, innero := range innerOffsetCond {
for bo, baseo := range baseOffsetCond {
for fn, filesize := range fileSizeCond {
t.Run(strings.Join([]string{"reading", sn, in, bo, fn}, "_"), func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or fmt.Sprintf(...) might be easier to read? Your call.

and several fixes related with coding conventions.

Signed-off-by: Kohei Tokunaga <[email protected]>
@ktock
Copy link
Contributor Author

ktock commented Oct 15, 2019

Thank you for your review. I'm ready for a new review.

@ktock ktock requested a review from bradfitz October 15, 2019 12:41
crfs.go Outdated
off = size
}
chunkData = chunkData[off:]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this all just be:

    n := copy(resp.Data[nr:], chunkData[offset+int64(nr)-ce.ChunkOffset:])

With that change, all your tests still pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your idea. I think it's better too. I fixed it.

@ktock ktock requested a review from bradfitz October 17, 2019 12:00
@bradfitz bradfitz merged commit 1e69efb into google:master Oct 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Signed a CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Contents of big files are broken
3 participants