Skip to content

Suggestion: simple type-checked pointer primitive #1363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
RReverser opened this issue Jun 27, 2020 · 34 comments
Open

Suggestion: simple type-checked pointer primitive #1363

RReverser opened this issue Jun 27, 2020 · 34 comments

Comments

@RReverser
Copy link

In many cases, I find it useful to have type-checked pointers (as opposed to just usize) to make sure that values loaded / stored in memory are indeed correct for the given pointer across the FFI boundary.

I've seen #1228, and, in particular the Pointer experiment reference there (https://github.com/AssemblyScript/assemblyscript/blob/master/tests/compiler/std/pointer.ts#L3) and I think it's a nice high-level API, but perhaps unnecessarily high-level for majority of cases.

On the other hand, the current primitives (load / store) are too low-level - they don't provide any typechecking whatsoever, and allow storing arbitrary data to arbitrary pointers.

I propose a solution somewhere in between these two that I've come to use myself - a primitive for opaque pointers:

@final
//@ts-ignore: decorators
class ptr<T> {
  deref: T;
}

This doesn't provide nice high-level APIs for pointer arithmetic, but in many cases it's not necessary. Often enough, as a user, you simply want to accept an opaque typechecked pointer (which is represented as usize under the hood), and read or write data to it, and be done with it. This is what such class provides.

class is already represented by AssemblyScript as a pointer to the actual data, and in this case data is whatever the T is, so you can read and write data via such helper by simply:

export function doSomethingWithFloat(f: ptr<f32>): void {
  // read float
  let float = f.deref;
  // ...do something
  // write float
  f.deref = float * float;
}

Compare this to a "raw" solution:

export function doSomethingWithFloat2(f: usize): void {
  let float = load<f32>(f);
  // ...do something
  // write float
  store<f32>(f, float * float);
}

Both compile to precisely same code (in fact, they get merged by Binaryen if put in the same file), but in the latter case it's too easy for different invocations on the same pointer to accidentally get out of sync in terms of types, as well as public API is less clear on what f is supposed to contain.


Anyway, just thought I'd post this and would love to hear your thoughts.

@MaxGraey
Copy link
Member

MaxGraey commented Jun 27, 2020

btw if you want really zero-cost pointer abstraction it better use something like this:

@final class ptr<T> {
  @inline constructor(offset: usize = 0) {
    return changetype<ptr<T>>(offset);
  }
  @inline get deref(): T {
    return changetype<T>(this);
  }
  @inline set deref(value: T) {
    store<T>(changetype<usize>(this), value);
  }
}

let fooPtr = new ptr<i32>(0x200);
let foo = fooPtr.deref;

@RReverser
Copy link
Author

RReverser commented Jun 27, 2020

The one above is already zero-cost and doesn't require changetype to bypass typechecker, and allows writing as well.

You could simulate the same via inlined getters/setters, but IMO it's pointless and further from zero-cost, when you can just use a field :)

@MaxGraey
Copy link
Member

It zero-cost only if you store / load from deref but when you create it via new ptr<...>(...) it alloc memory for class instance and use also ARC for handling this managed object.

@RReverser
Copy link
Author

but when you create it via new ptr<...>(...) it alloc memory for class instance and use also ARC for handling this managed object

I didn't suggest creating pointers - for constructor I was thinking of either making it private (to make it truly opaque and leave up to people to use changetype), or providing similar conversion, or just leaving the default as-is in case people actually want to create managed pointers.

That's probably worth a separate discussion, my main focus is the deref bit.

@RReverser
Copy link
Author

RReverser commented Jun 27, 2020

Btw, your get/set code above doesn't compile in couple of places:

@final class ptr<T> {
  @inline constructor(offset: usize = 0) {
    return changetype<ptr<T>>(offset);
  }
  @inline get deref(): T {
    return changetype<T>(this); // Type 'main/ptr<f32>' cannot be changed to type 'f32'.
  }
  @inline set deref(value: T): void { // A 'set' accessor cannot have a return type annotation.
    store<T>(changetype<usize>(this), value);
  }
}

It's not hard to fix up like this:

@final class ptr<T> {
  @inline constructor(offset: usize = 0) {
    return changetype<ptr<T>>(offset);
  }
  @inline get deref(): T {
    return load<T>(changetype<usize>(this));
  }
  @inline set deref(value: T) {
    store<T>(changetype<usize>(this), value);
  }
}

but just shows yet another reason to avoid bypassing typechecker and keeping code simple.

After all, static types is the strong suit of TypeScript / AssemblyScript - best to use them directly as much possible, without changetype / load / store.

@MaxGraey
Copy link
Member

Fixed. This a fiddle: https://webassembly.studio/?f=b1m6z7wfv5h

@RReverser
Copy link
Author

Yeah I was fixing up in fiddle too.

@RReverser
Copy link
Author

RReverser commented Jun 27, 2020

But, again, compare the complexity of those conversions and just deref: T; from my issue - the latter feels much more like a "primitive" and it's properly type-checked without a chance for invalid cast.

What you're designing looks more like the Pointer class I referenced in the issue, which uses similar logic in constructor and getters/setters:

Pointer experiment reference there (https://github.com/AssemblyScript/assemblyscript/blob/master/tests/compiler/std/pointer.ts#L3)

and precisely what I wanted to avoid by suggesting a lower-level primitive.

@MaxGraey
Copy link
Member

MaxGraey commented Jun 27, 2020

But without that changes ptr<T> more look like as Box<T> in rust (pointer to heap abstraction) =)

Anyway this suggestion makes sense but in my opinion it should be built-in abstraction type for keeping efficiency and more powerful semantic checking.

cc @dcodeIO

@RReverser
Copy link
Author

RReverser commented Jun 27, 2020

pointer to heap abstraction

Only if you create it. If the goal is to just receive them via FFI, it doesn't matter. But if you want to return pointers from a function to JS so that it could store and return them later like

export function foo(): ptr<f32> {
  return new ptr<f32>(1.2);
}

then the default heap-based constructor makes sense IMO, since any other pointer wouldn't survive. So yeah, constructor is Box-like, but when you don't use new, such class is compatible with arbitrary pointers as well.

@dcodeIO
Copy link
Member

dcodeIO commented Jun 27, 2020

Perhaps both concepts can be merged into one, so we get the simplicity of the .deref field (maybe name it .value), plus the utility of the pointer class?

@RReverser
Copy link
Author

maybe name it .value

I kinda prefer deref because it's more explicit about what it is, whereas value is a very generic name and ptr.value might be ambiguous for a reader (is it value of the pointer as a number or is it dereferenced value?).

plus the utility of the pointer class?

I think pointer class can extend the low-level primitive, adding more methods, but for those who don't opt-in, they can use ptr as simply an opaque pointer. But I don't have strong opinion on this, as long as there is any typechecked pointer type that can be easily used across FFI boundary.

ptr as defined in issue description seems very simple to define and well-optimised in self-hosted AssemblyScript yet powerful enough to cover most use-cases already, but if you feel that operator overloads would be commonly used too, I guess they can be added as well.

@MaxGraey
Copy link
Member

MaxGraey commented Jun 27, 2020

Also if ptr<T> will be builtin we could safely call methods which associate to wrapped class like:

class Foo {
   method(): i32 { ... }
}

let fooPtr = new ptr<Foo>();

fooPtr.method(); // which check holded raw pointer value with null and only after that make `method` call

which actually will be:

let foo = fooPtr.deref;
if (foo !== null) foo.method();
else throw new Error('null pointer deref');

And it could be even statically check if this ptr created inside module and not comes from outside

@MaxGraey
Copy link
Member

MaxGraey commented Jun 27, 2020

However perhaps better just make deref as nullable and access to members like:

let fooPtr = new ptr<Foo>(...);
fooPtr.deref?.method();
// or
fooPtr.deref!.method();

@RReverser
Copy link
Author

@MaxGraey

  1. ptr<Foo> is not necessary, since classes are already represented by pointer, so you're basically defining an analogy of **Foo from C, not *Foo. There are some cases where it's useful, but it won't help with common method calls.

  2. Even if this did work, in WebAssembly, 0 adddress is as valid as any other. While C / C++ / Rust target other systems where it's not and so usually restrict pointers to non-null, there's no actual reason to do this in AssemblyScript or any other Wasm-oriented language IMO.

@MaxGraey
Copy link
Member

MaxGraey commented Jun 27, 2020

Hmm, may be I'm not understand main goals of proposal. I was under impression that ptr<T> is just more safer version of type unsound usize and mostly needed for handling external C interfaces like WASI

Even if this did work, in WebAssembly, 0 adddress is as valid as any other

Yes, but in Rust, C++ and AS at least first 4 bytes reserved for null pointer. However you could write to this memory position without any restriction, that's true

I try to build analogies with C pointers:

C/C++ AS
Foo* p let p: ptr<Foo>
Foo foo = *p let foo = p.deref
Foo* p = &foo let p = ptr.from(foo)
Foo* p = (Foo*)0x80 let p = new ptr<Foo>(0x80)

Is it make sense?

@RReverser
Copy link
Author

I was under impression that ptr<T> is just more safer version of type unsound usize and mostly needed for handling external C interfaces like WASI

Yeah, but such wrapping is not necessary for classes, because they are already represented with pointers as-is. It's mostly primitives that are a problem (e.g. size_t *len would be represented with ptr<usize> not just usize).

Well, it will also help with some cases where target expects a SomeStruct ** (for the function to store its own pointer) - then ptr<SomeStruct> would still make sense, and then default heap-based ptr constructor is still useful. But that's a less frequent use-case anyway.

@MaxGraey
Copy link
Member

MaxGraey commented Jun 28, 2020

Oh, it seems you suggest box<T> which exactly wrapping (boxing) non-heap (stack-based) primitives to heap-based managed wrapper (object).
Снимок экрана 2020-06-28 в 17 08 47

@RReverser
Copy link
Author

Not really; Box in Rust (assuming that's what you're referring to) is specifically a heap object, while ptr<T> in general doesn't make any assumptions about where the data is stored, and can be either on heap or stack or a static variable or anything - it's literally just a generic pointer type.

Just like T* in C / C++ can point to any memory location, similarly ptr<T> can point to any memory location where the actual T is stored.

@jtenner
Copy link
Contributor

jtenner commented Jun 29, 2020

@RReverser except in Web Assembly the stack is virtual. You can't point to something on the stack. Box<T> and ptr<T> are equivalent in functionality. At least in web assembly.

Edit:

In fact! When rust compiles to web assembly, if it needs to become a ptr, I bet it becomes effectively boxed.

@jtenner
Copy link
Contributor

jtenner commented Jun 29, 2020

I believe I've surfaced complaints about the ergonomics of changetype<T>(), and after lots of soul searching and consideration, I've found it to be quite useful.

@MaxGraey
Copy link
Member

MaxGraey commented Jun 29, 2020

Not really; Box in Rust (assuming that's what you're referring to) is specifically a heap object, while ptr in general doesn't make any assumptions about where the data is stored, and can be either on heap or stack or a static variable or anything - it's literally just a generic pointer type.

Let's look at another language with managed types, whose boxing was long before Rust. I mean C#. There are has ability to "call-by-value" and "call-by-reference" put "out" or "ref" modifiers before formal and actual parameters like:

void CallByRef(ref int num) {
    num = 111;
}

int res = 0;
CallByRef(ref res);
// res has 111 value now

The same thing could be simulate via boxing / unboxing (to object) for res and actually exactly how it works in C# under the hood.

And on Rust variants:

pub fn call_by_ref(num: &mut i32) {
   *num = 111;
}

pub fn call_by_ref_via_box(mut num: Box<i32>) {
   *num = 111;
}

But on Rust it's works differently under the hood due to Rust has shadow stack and don't need implicitly boxing for &mut i32 but in managed runtimes (and without shadow stack) boxing is only one possible way I guess (if don't simulate shadow stack).

@RReverser
Copy link
Author

@MaxGraey TBH, I don't know where this discussion is going / what is the purpose, it seems to have become a bit too long and goes into language theory discussions :)

Let's focus on practical side: are you trying to figure out how this wrapper is represented or how it's going to be used or ..? I want to understand how I can improve my explanations above in case they weren't clear.

@RReverser
Copy link
Author

except in Web Assembly the stack is virtual. You can't point to something on the stack. Box<T> and ptr<T> are equivalent in functionality. At least in web assembly.

No, this is not true. In native languages / machines stack and heap also technically live in the same memory, but language-level semantics of Box vs general pointers still matter. One is guaranteed to be on the heap, another is a general-purpose address that doesn't care where data is stored.

All in all, I feel that Box discussion is derailing from the original purpose and proposal.

@MaxGraey
Copy link
Member

Let's focus on practical side: are you trying to figure out how this wrapper is represented or how it's going to be used or ..? I want to understand how I can improve my explanations above in case they weren't clear.

I'm just trying to understand all aspects and find a more generalized solution. Because this proposal looks too specific to support at the language or runtime level.

@dcodeIO
Copy link
Member

dcodeIO commented Jun 29, 2020

Looks like the confusion here comes from ptr<T> being incomplete, in that it doesn't handle struct-likes in an intuitive way (leading to the **Foo situation mentioned above). The Pointer experiment tries to tackle that by handling references like structs, since in the reference case, we'd most likely be dealing with an @unmanaged class representing the layout of a C-like struct. Hence my suggestion to look into merging both into one to make it a more or less complete concept suitable for most situations.

Regarding Box: Such a pointer and a Box<T> are similar ofc, but the important difference is that Box<T> is a managed object (is it?), hence reference counted with a GC header and whatnot, making it incompatible with changetypeing, which is necessary on the boundary. Language-wise I think that an unmanaged pointer interface makes more sense than a box.

@dcodeIO
Copy link
Member

dcodeIO commented Jun 29, 2020

Regarding making this a complete concept: Perhaps we should go an rename @unmanaged to @struct to give it proper meaning (have been considering this for a while already), with the following cases supported by Pointer:

  1. Value: Pointer<valtype> is the address of a basic value in memory
  2. Class: Pointer<Class> is the address of a pointer to a managed class - this is somewhat uncommon
  3. Struct: Pointer<Struct> is the address of a C-like struct in memory - this is common but somewhat special

Unfortunately this conflicts with just a .deref field because the struct case needs special inlined logic.

@MaxGraey
Copy link
Member

Perhaps we should go an rename @Unmanaged to @struct

It could lead to wrong assumption about that class is allocated on stack and passed by value which is not true

@dcodeIO
Copy link
Member

dcodeIO commented Jun 29, 2020

The renaming is not ultimately necessary if you think that'd be confusing. Making this a complete concept would be more important when we want to add it to stdlib.

@RReverser
Copy link
Author

@dcodeIO I like your suggestion, although I'm not sure special handling for structs/classes is worth the complexity. To me, it feels a bit magical and I'd rather teach users that objects are already pointers on their own. I'm not strongly opposed to the special handling, just not sure whether it's worth it.

@RReverser
Copy link
Author

Regarding Box: Such a pointer and a Box<T> are similar ofc, but the important difference is that Box<T> is a managed object (is it?), hence reference counted with a GC header and whatnot, making it incompatible with changetypeing, which is necessary on the boundary. Language-wise I think that an unmanaged pointer interface makes more sense than a box.

This is a good summary too, by the way. ptr<T> is not managed pointer, although it can be constructed from one, of course, just like Box / unique_ptr from other languages can be turned into a raw pointer.

@stale
Copy link

stale bot commented Jul 29, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 29, 2020
@stale stale bot closed this as completed Aug 6, 2020
@RReverser
Copy link
Author

That seems like a short level for stale issues 😅

@dcodeIO dcodeIO added enhancement and removed stale labels Aug 6, 2020
@dcodeIO dcodeIO reopened this Aug 6, 2020
@dcodeIO
Copy link
Member

dcodeIO commented Aug 6, 2020

Whoops, forgot to add a label :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants