Truncate unnecessary data before specified offset #18

ktock · 2019-09-13T01:19:47Z

Fixes: #17
Sometimes, big files are broken in CRFS.

When we read a big file, the actual readings are separated to several blocks. In
such situations, a node is requested to read at specific offset, but CRFS
doesn't truncate unnecessary data before the offset.

This commit solve this issue by truncating unnecessary data before specified
offset when CRFS fetch first chunk of required range.

When we read big file, the actual readings are separated to several blocks. In such situations, a node is requested to read at specific offset, but CRFS doesn't truncate unnecessary data before the offset. As a result, big files are broken in CRFS. This commit solve this issue by truncating unnecessary data before specified offset when CRFS fetch first chunk of required range. Signed-off-by: Kohei Tokunaga <[email protected]>

bradfitz · 2019-09-13T14:58:01Z

crfs.go

@@ -1060,6 +1060,10 @@ func (h *nodeHandle) Read(ctx context.Context, req *fuse.ReadRequest, resp *fuse
 		if err != nil {
 			return err
 		}
+		if nr == 0 {
+			// Truncate unncessary data
+			chunkData = chunkData[req.Offset:]


This doesn't seem right in general.

This is in a loop (line 1050) is will keep filling, but req.Offset should only be used for the first chunk. And even then, only the part between ce.ChunkOffset and req.Offset should probably be used.

If they wanted overall req.Offset 100 but the 2nd chunk of the file is at offset 90, then what you really want here is offset 10. You don't just want 100 each time.

This could probably use some tests, like the ones in the stargz package.

Thank you for your review. Right. I fixed offset calculation and added some checks in 15a2736.

Signed-off-by: Kohei Tokunaga <[email protected]>

ktock · 2019-09-19T22:56:59Z

@bradfitz Could I get any comments?

bradfitz

This modifies two packages and adds no new tests in either.

I'd feel much more confident in this if it had tests.

Signed-off-by: Kohei Tokunaga <[email protected]>

ktock · 2019-10-06T01:43:31Z

@bradfitz Could I get any comments?

bradfitz · 2019-10-11T04:48:59Z

Thanks for adding tests. I'll take a look.

bradfitz · 2019-10-14T17:23:02Z

stargz/stargz_test.go

@@ -488,3 +488,72 @@ func symlink(name, target string) tarEntry {
 		})
 	})
 }
+
+const (
+	chunkSize int64 = 4


remove int64

bradfitz · 2019-10-14T17:23:21Z

crfs_test.go

+)
+
+const (
+	chunkSize  int64  = 4


remove int64 & string types

See https://blog.golang.org/constants

bradfitz · 2019-10-14T17:28:56Z

stargz/testutil.go

+
+package stargz
+
+// Makes minimal Reader of "reg" and "chunk" without tar-related information.


Document this the normal Go way:
https://github.com/golang/go/wiki/CodeReviewComments#comment-sentences

But also, let's rename it to not include "Stub".

But why is it exported at all? Can't you just put this in the test package where it's needed? Then naming & docs aren't quite as important. But once it's public API, the quality bar is much higher.

Thank you for your review. I came up with another implementation not to export the function and fixed it.

bradfitz · 2019-10-14T17:31:22Z

crfs_test.go

+
+						if !bytes.Equal(wantData, resp.Data) {
+							t.Errorf("off=%d; read data = (size=%d,data=%s); want (size=%d,data=%s)",
+								offset, len(resp.Data), string(resp.Data), wantN, string(wantData))


in Go tests, when a string or []byte is known bad in a failing test, format it with %q instead of %s. %q nicely shows emptiness, trailing/leading whitespace, binary chars, etc.

And you don't need the string(...) conversion either.

bradfitz · 2019-10-14T17:31:56Z

crfs_test.go

+		for in, innero := range innerOffsetCond {
+			for bo, baseo := range baseOffsetCond {
+				for fn, filesize := range fileSizeCond {
+					t.Run(strings.Join([]string{"reading", sn, in, bo, fn}, "_"), func(t *testing.T) {


or fmt.Sprintf(...) might be easier to read? Your call.

and several fixes related with coding conventions. Signed-off-by: Kohei Tokunaga <[email protected]>

ktock · 2019-10-15T12:41:12Z

Thank you for your review. I'm ready for a new review.

bradfitz · 2019-10-15T17:34:54Z

crfs.go

+				off = size
+			}
+			chunkData = chunkData[off:]
+		}


Can't this all just be:

n := copy(resp.Data[nr:], chunkData[offset+int64(nr)-ce.ChunkOffset:])

With that change, all your tests still pass.

Thank you for your idea. I think it's better too. I fixed it.

Signed-off-by: Kohei Tokunaga <[email protected]>

googlebot added the cla: yes Signed a CLA label Sep 13, 2019

bradfitz reviewed Sep 13, 2019

View reviewed changes

Add more check to safely truncate unnecessary data.

15a2736

Signed-off-by: Kohei Tokunaga <[email protected]>

ktock force-pushed the bigfile branch from 4247449 to 15a2736 Compare September 14, 2019 06:47

ktock requested a review from bradfitz September 20, 2019 08:56

ktock mentioned this pull request Sep 25, 2019

x/build: speed up large container start-up times without pre-pulling containers into VMs (CRFS) golang/go#30829

Open

bradfitz suggested changes Sep 25, 2019

View reviewed changes

Add tests for offset and size calculation.

3d7e162

Signed-off-by: Kohei Tokunaga <[email protected]>

ktock requested a review from bradfitz September 29, 2019 06:25

bradfitz suggested changes Oct 14, 2019

View reviewed changes

Do not export unnecessary testing functions.

fa18c71

and several fixes related with coding conventions. Signed-off-by: Kohei Tokunaga <[email protected]>

ktock requested a review from bradfitz October 15, 2019 12:41

bradfitz reviewed Oct 15, 2019

View reviewed changes

Make the implementation simpler and easier to understand

cde2b63

Signed-off-by: Kohei Tokunaga <[email protected]>

ktock requested a review from bradfitz October 17, 2019 12:00

bradfitz merged commit 1e69efb into google:master Oct 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Truncate unnecessary data before specified offset #18

Truncate unnecessary data before specified offset #18

Uh oh!

ktock commented Sep 13, 2019

Uh oh!

bradfitz Sep 13, 2019

Uh oh!

ktock Sep 14, 2019

Uh oh!

ktock commented Sep 19, 2019

Uh oh!

bradfitz left a comment

Uh oh!

ktock commented Oct 6, 2019

Uh oh!

bradfitz commented Oct 11, 2019

Uh oh!

bradfitz Oct 14, 2019

Uh oh!

bradfitz Oct 14, 2019

Uh oh!

bradfitz Oct 14, 2019

Uh oh!

ktock Oct 15, 2019

Uh oh!

bradfitz Oct 14, 2019

Uh oh!

bradfitz Oct 14, 2019

Uh oh!

ktock commented Oct 15, 2019

Uh oh!

bradfitz Oct 15, 2019

Uh oh!

ktock Oct 17, 2019

Uh oh!

Uh oh!


		package stargz

		// Makes minimal Reader of "reg" and "chunk" without tar-related information.

Truncate unnecessary data before specified offset #18

Truncate unnecessary data before specified offset #18

Uh oh!

Conversation

ktock commented Sep 13, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ktock commented Sep 19, 2019

Uh oh!

bradfitz left a comment

Choose a reason for hiding this comment

Uh oh!

ktock commented Oct 6, 2019

Uh oh!

bradfitz commented Oct 11, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ktock commented Oct 15, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!