Skip to content

Fix interp stack overflow with self-referencing data types #2001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 2, 2021

Conversation

aykevl
Copy link
Member

@aykevl aykevl commented Jul 15, 2021

This PR partially fixes #1848. It's relatively big: it adds object layout information to the IR which is normally optimized away but is used by the interp package to avoid problematic cases like linked list, tree structures, etc.

It also improves #1306 and #1830 but doesn't yet fix them. Somehow they're still failing.

@aykevl aykevl force-pushed the precise-alloc branch 3 times, most recently from 95f69ea to e1a1f78 Compare September 8, 2021 22:49
@aykevl
Copy link
Member Author

aykevl commented Sep 8, 2021

There appears to be one real bug in the arduino test:

error: interp: ptrtoint integer size does not equal pointer size

I'll need to look into that. Other than that, this PR passes all tests.

@dkegel-fastly
Copy link
Contributor

FWIW, after this patch, my app builds fine except that on slower systems (our CI), I get a build timeout, and have to raise the one minute timeout thusly:

sed -i 's,time.Minute,3 * time.Minute,' interp/interpreter.go

Time shows the compile took 2 minutes 25 seconds, so a 3 minute timeout should just do on the ci for this app today.

Making it adjustable from the commandline might be nice, otherwise I'll probably have to keep building from source after the release.

(And yes, I should figure out what takes so long to compile, and get rid of it... it is probably a regexp-compile-like-thing in an init function in a dependency.)

@aykevl
Copy link
Member Author

aykevl commented Sep 15, 2021

I've investigated the error in the Arduino Uno test, and I think I know what's going on. But not sure yet how to fix this.

Note to self: the problem is that runtime.funcValue has an id field that is of type uintptr, but a pointer to a global is stored in it for optimization reasons. I don't know why this triggers on the Arduino Uno test and not on WebAssembly where the same problem exists.

Comment on lines +209 to +215
if typ.StructName() == "runtime.funcValue" {
// Hack: the type runtime.funcValue contains an 'id' field which is
// of type uintptr, but before the LowerFuncValues pass it actually
// contains a pointer (ptrtoint) to a global. This trips up the
// interp package. Therefore, make the id field a pointer for now.
typ = c.ctx.StructType([]llvm.Type{c.i8ptrType, c.i8ptrType}, false)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've worked around the build failure for the Arduino Uno (which could also affect WebAssembly) using this hack. I don't like it, but I don't really see a cleaner way without some major refactoring. And this seems like a relatively small, unlikely to be problematic hack.

@dkegel-fastly
Copy link
Contributor

dkegel-fastly commented Oct 6, 2021

That's great, thanks. (Unfortunately rebasing the branch on current dev appears to have yielded a compiler we can't use because of #2101, sigh. Evidently the sands shifted on us.)

@aykevl
Copy link
Member Author

aykevl commented Oct 6, 2021

All tests are passing! Well, except for the circuitplay-express test which appears to be failing due to a hardware issue.
CircleCI looks like it's still in progress but that's only because I was messing with some CircleCI settings. All those tests have in fact long been passed.

@dkegel-fastly yeah but that must be a separate issue.

@dgryski
Copy link
Member

dgryski commented Oct 14, 2021

So, seems like this can be merged?

@dgryski
Copy link
Member

dgryski commented Oct 15, 2021

While testing golang.org/x/crypto, I think I found a bug traced to this branch:

package main

var testCases = []struct {
	msg []byte
	out [8]byte
}{
	{
		msg: []byte("Hi There"),
		out: [8]byte{
			0x87, 0xaa, 0x7c, 0xde, 0xa5, 0xef, 0x61, 0x9d,
		},
	},
}

func main() {
	println(testCases[0].out[0])
}

produces

# main
main/<init>:10:10: interp: pointer index with something other than a struct or array?
  store i8 -86, i8* %8, align 1, !dbg !35

traceback:
main/<init>:10:10:
  store i8 -86, i8* %8, align 1, !dbg !35

@deadprogram
Copy link
Member

Merge conflicts need resolution, please.

@aykevl aykevl force-pushed the precise-alloc branch 2 times, most recently from 8263623 to 54bcc8c Compare October 30, 2021 18:07
@aykevl
Copy link
Member Author

aykevl commented Oct 30, 2021

I've rebased this PR.

@dgryski Thank you! I've updated the PR to fix this issue. It was actually not directly in the PR but rather caused by this PR as a side effect. Basically, I fixed it by simplifying the code and thus eliminating the error entirely.

@deadprogram
Copy link
Member

@aykevl merge conflict needs resolution, please.

This layout parameter is currently always nil and ignored, but will
eventually contain a pointer to a memory layout.

This commit also adds module verification to the transform tests, as I
found out that it didn't (and therefore didn't initially catch all
bugs).
@aykevl
Copy link
Member Author

aykevl commented Nov 1, 2021

I have rebased the branch again.

@deadprogram
Copy link
Member

Seems like all tests failing in this branch.

This commit adds object layout information to new heap allocations. It
is not yet used anywhere: the next commit will make use of it.

Object layout information will eventually be used for a (mostly) precise
garbage collector. This is what the data is made for. However, it is
also useful in the interp package which can work better if it knows the
memory layout and thus the approximate LLVM type of heap-allocated
objects.
This is uncommon, but it does happen if the source pointer is a bitcast
of a global. For example, if a struct is cast to an i8*, it's possible
to index beyond what would appear to be the size of the pointer (i8*).
Instead of doing lots of complicated calculations to get the shortest
GEP, I'll just cast it to i8*, do the GEP, and optionally cast to the
requested type.

This currently produces ugly constant expressions, but once LLVM
switches to opaque pointer types all of this shouldn't matter anymore.
This commit will use the memory layout information for heap allocations
added in the previous commit to determine LLVM types, instead of
guessing their types based on the content. This fixes a bug in which
recursive data structures (such as doubly linked lists) would result in
a compiler stack overflow due to infinite recursion.

Not all heap allocations have a memory layout yet, but this can be
incrementally fixed in the future. So far, this commit should fix
(almost?) all cases of this stack overflow issue.
@aykevl
Copy link
Member Author

aykevl commented Nov 1, 2021

I've updated the PR, it should be fixed now.

@deadprogram
Copy link
Member

All tests are now passing, and code looks good. Thanks @aykevl for working on this and thanks @dkegel-fastly and @dgryski for all the great work testing and reviewing.

Now merging!

@deadprogram deadprogram merged commit 1869efe into dev Nov 2, 2021
@deadprogram deadprogram deleted the precise-alloc branch November 2, 2021 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants