Skip to content

Type safe zero-runtime-cost sum types. #3273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
modulovalue opened this issue Aug 15, 2023 · 8 comments
Closed

Type safe zero-runtime-cost sum types. #3273

modulovalue opened this issue Aug 15, 2023 · 8 comments

Comments

@modulovalue
Copy link

Consider, for example, the following model:

// Model A
class Root {}

class A implements Root {
  B b;
}

class B implements Root {
  C c;
}

class C implements Root {}

There is an obvious optimization for representing Model A as an explicit graph that avoids the overhead of an object hierarchy:

// Model B
class Domain {
  // Each node is assigned a unique id from "0" to "amountOfNodes - 1"
  int amountOfNodes; // total amount of 'A's, 'B's and 'C's

  // Each node is assigned a list of nodes that they point to.
  List<List<int>> children;
}

Unfortunately, given such an efficient representation of Model A as Model B, there is no way to restore type safety without wrapper objects.

That is, to strongly type each node in Model B (to, for example, expose an efficient, safe and minimal API), we'd need a class hierarchy to model the sum type inherent to our domain. Users would be forced to pay the price of the overhead that comes with having objects.

For cases where each node can be modeled by a sum type with only one variant, extension types appear to be able to help. However, extension types do not appear to address the case where a sum type of one variant is not enough.

I think that supporting "inline classes/extension types" over domains that can only be modeled by sum types with more than one variant is important and I hope the language team will consider that for the design of extension types, or at least consider possible extensions to extension types in the future, that will allow that language feature to be extended to support modeling sum types with more than one variant.

@eernstg
Copy link
Member

eernstg commented Aug 15, 2023

It seems plausible to claim that the hierarchy Root, A, B, C can be modeled equivalently by the following enum:

enum ModelA { a, b, c }

The values ModelA.a corresponds to A(B(C())) (assuming suitable constructors), ModelA.b corresponds to B(C()), and ModelA.c corresponds to C(). There is no information (as far as I can see) which is not fully represented by the enum, and vice versa. That is, if you have one then you can reconstruct the corresponding one, with no loss of information. That's basically everything you can ever do with Root (assuming that it isn't intended to have a bunch of subtypes, perhaps declared by others).

So maybe you need to have a more powerful Root hierarchy in order to illustrate what it is that you are modeling which isn't handled just as well by a simple enum?

@lrhn
Copy link
Member

lrhn commented Aug 15, 2023

The enum doesn't represent the different identities of the instances of A, B and C. There is a distinction between

var b1 = B(A());
var b2 = B(A());

and

var b1 = B(A());
var b2 = B(b1.a);

(Assume proper constructors, and then we might as well assume some useful APIs too.)

Still, there is no way to restore type safety without something ensuring type safety at runtime.
And it's not clear what that means.

There is nothing to make an integer that Domain considers representing a B different from one representing an A, unless entry into the A or B types can be guarded by user-written code, at runtime. Code which can actually tell whether it's supposed to be an A or a B, because the integer itself doesn't know.
Which again means the A and B types need to exist, at least as concepts, at runtime, which the extension-type types do not. (Deliberately.)
And that any attempt to convert an integer to an A or B must check whether that's actually allowed.

So lets consider a "reified extension type", one which is still skin-deep (has a representation type and value that is the actual object), but which exists in the runtime type system as well, as a type separate from the representation type.
So, reifiedextensiontype A(int x) {} would be such a type. It's a subtype of Object (maybe) and not much else.
A List<A> is different from a List<int>, and neither is assignable to the other. A List<int> cannot be cast to List<A> because A is not a supertype of int.

That leaves us to figure out what happens if you do 1 as A.

It has to decide whether it's allowed or not. Which means running user code.

So we allow reified extension type declarations to declare a bool operator A.is(RepresentationType _); which the reified extension type can implement to answer the "is this integer a valid A". When you do 1 is A, it calls that static method, and if it returns true, the cast succeeds, otherwise it throws. The static type of A (with representation type int) has been validated.
That will also allow subtyping, where the type represents only some values of an existing type. There can be a reified extension type EvenInt which only allows, well, even integers.
(The constructor-like notation A.is is chosen because this is a class method, not a static method. It has access to the type parameters of the surrounding declaration. Probably just confusing.)

The is operator is used by both is and as checks, try/on checks, pattern type checks, and any other runtime check which needs to determine if a value is an instance of a type.
(And it's a little worrisome that the result can change, but probably not something that won't just result in runtime errors. A promotion from Root to B based on Root x = ...; if (x is B) ... should mean that x can be used as a B. If the domain changes before we do x.bMethod(), so is B would now return false, ... well, don't do that. (Patters will cache the result, and not ask again about the same type.)

Now, let's go back to the hierarchy. If we make A, B, C and Root be reified extension types, with operator .is which checks against some bespoke Domain object to figure out whether an given integer is any of the above, then one can write:

enum _RootKind {a, b, c}
class _Domain {
  static int nextId = 0;
  static final Map<int, (_RootKind kind, int? childId)> idMap = {}
  A newA(B b) {
    var id = nextId++;
    idMap[id] = (_RootKind.a, b._id);
    return id as A;
  }
  B newB(C c) {
    var id = nextId++;
    idMap[id] = (_RootKind.b, c._id);
    return id as B;
  }
  C newC() {
    var id = nextId++;
    idMap[id] = (_RootKind.c, -1);
    return id as C;
  }
}
abstract reified extension type Root._(int _id) { // `abstract` just means cannot use constructor.
  bool operator Root.is(int id) => 0 <= id && id < _Domain.nextId;
}
reified extension type A._(int _id) extends Root {
  factory A(B b) {
    var id = _Domain.nextId++;
    _Domain.idMap[id] = (_RootKind.a, b._id);
    return A._(id);
  }
  bool operator A.is(int id) => 
    switch (_Domain.mapId[id]) {(_RootKind.a, _) => true, _ => false};
  B get b => switch (_Domain.mapId[_id]) {(_RootKind.a, var bId) => B._(bId), _ => throw StateError("Not an A")};
}
// Same for `B`, and `C` without the field getter.

Then I can write code like:

Root? childNode(Root node) =>
  switch(node) {
     A a => a.b,
     B b => b.c,
     _ => null;
  }

and it will actually work at runtime.

This is a complicated feature. Lots of ways it can mess up performance, just by existing. There is a reason extension type doesn't affect runtime type checking.
It's not clear that you'll get any performance benefits at all, if every type check is going to run user code.

Honestly, just create a class if you want encapsulation and runtime type safety. It's not like classes are the antithesis of performance, we've gotten this far with nothing but classes!

@modulovalue
Copy link
Author

I apologize, it seems that my description wasn't clear enough. Here is a concrete example that should hopefully clarify things a bit:

The following example constructs a graph that contains 4 instances of Root. (The first node is an instance of A pointing to the second node which is an instance of B which points to a node that is a C and there's a fourth node that is a C that points to nothing.)

void main() {
  final nodes = const Store(
    [Store.aTag, Store.bTag, Store.cTag, Store.cTag],
    [
      // A at 0 points to 1.
      [1],
      // B at 1 points to 2.
      [2],
      // C at 2 points to nothing.
      [],
      // C at 3 points to nothing.
      [],
    ],
  ).all();
  print(nodes);
}

class Store {
  static const int aTag = 0;
  static const int bTag = 1;
  static const int cTag = 2;
  final List<int> type;
  final List<List<int>> children;

  const Store(
    this.type,
    this.children,
  );

  Root _read(
    final int nodeId,
  ) {
    switch (type[nodeId]) {
      case aTag:
        return A(nodeId, this);
      case bTag:
        return B(nodeId, this);
      case cTag:
        return const C();
      default:
        throw Exception("Unknown Tag");
    }
  }

  Iterable<Root> all() sync* {
    for (int i = 0; i < type.length; i++) {
      yield _read(i);
    }
  }
}

abstract class Root {}

class A implements Root {
  final int _nodeId;
  final Store _store;

  const A(
    this._nodeId,
    this._store,
  );

  B get b => _store._read(_nodeId) as B;
}

class B implements Root {
  final int _nodeId;
  final Store _store;

  const B(
    this._nodeId,
    this._store,
  );

  C get c => _store._read(_nodeId) as C;
}

class C implements Root {
  const C();
}

To me, it doesn't look like the Root hierarchy from the example above could be replaced by an enum, because all variants of enums share the same members. That is, a ModelA.a could not have a b member without ModelA.b also having a b member.

But even if enums could be used here, the enum wrapper around _nodeId and _store would still cause a new allocation to be performed during runtime and introduce Object-related overhead? Whereas if we only had a single Root (without A B and C) we could make it an "extension type" over a record containing a nodeId and a store, which would remove the Root wrapper overhead and be zero-cost (assuming the runtime can optimize records in ways that matter here).


In pseudo-code, what I hope to be able to do with extension types would be something like the following:

extension type Root(int _nodeId, Store _store) = A {
  B get b => _store._read(_nodeId) as B;
} | B {
  C get c => _store._read(_nodeId) as C;
} | C {
  // There are no members.
}

That is, to have an extension type over a product type, that contains a nodeId and a store, where the extension type models a sum type of three variants, all with their own members.

For future reference: The fact that a store would need to be carried around is inconvenient, the following issue was meant to focus on making that more convenient: #3169

@modulovalue
Copy link
Author

modulovalue commented Aug 15, 2023

Thank you for the detailed analysis, Lasse. I agree with most of what you said.

I'm closing this issue because it seems that a completely different approach would be needed here.

@eernstg
Copy link
Member

eernstg commented Aug 16, 2023

@lrhn wrote:

The enum doesn't represent the different identities of the instances of A, B and C.

That's true. I noted that Domain doesn't seem to represent identities either, and jumped to the conclusion that they weren't intended to matter.

@modulovalue wrote:

a completely different approach would be needed here.

Are you sure? ;-)

Here is a variant of the example that maintains a (presumably cheap?) representation of a set of trees with type tags and children lists, and allows for passing around identifications of those trees in two shapes: An unreified form which is just the index and a reference to the store, and a reified form which is the actual tree:

class Store {
  static const int aTag = 0;
  static const int bTag = 1;
  static const int cTag = 2;
  final List<int> type;
  final List<List<int>> children;

  const Store(
    this.type,
    this.children,
  );

  Root reify(
    final int nodeId,
  ) {
    switch (type[nodeId]) {
      case aTag:
        return A(reify(children[nodeId][0]) as B);
      case bTag:
        return B(reify(children[nodeId][0]) as C);
      case cTag:
        return const C();
      default:
        throw Exception("Unknown Tag");
    }
  }

  Iterable<Root> all() sync* {
    for (int i = 0; i < type.length; i++) yield reify(i);
  }
}

// Reified hierarchy.

abstract class Root {}

class A implements Root {
  final B b;
  const A(this.b);
}

class B implements Root {
  final C c;
  const B(this.c);
}

class C implements Root {
  const C();
}

// Non-reified representation using extension type.

typedef RootRecord = (int nodeId, Store store);

extension type const NrRoot<X extends Root>._(RootRecord it) {
  NrRoot(int nodeId, Store store): it = (nodeId, store);
  int get _nodeId => it.$1;
  Store get _store => it.$2;
  X get reify => _store.reify(_nodeId) as X;
}

// Usage example.

const store = Store(
  [Store.aTag, Store.bTag, Store.cTag, Store.cTag],
  [
    // A at 0 points to 1.
    [1],
    // B at 1 points to 2.
    [2],
    // C at 2 points to nothing.
    [],
    // C at 3 points to nothing.
    [],
  ],
);

void main() {
  // We can play safe.
  var nrRoot = NrRoot(1, store); // Type is `NrRoot<Root>`.
  print(nrRoot); // '(1, Instance of 'Store')'.

  var root = nrRoot.reify; // Static type `Root`
  if (root is B) print(root.c); // "Instance of 'C'".

  // When an `NrRoot` is created, the type argument `B` is just a manual
  // promise that it will be a `B`. The type system can't check that.
  var nrRoot2 = NrRoot<B>(1, store);
  print(nrRoot2); // '(1, Instance of 'Store')'.

  var b = nrRoot2.reify; // Static type `B`. Checked dynamically.
  print(b.c); // "Instance of 'C'".

  // We still have the `all` method.
  print(store.all());
}

@modulovalue modulovalue reopened this Aug 16, 2023
@modulovalue
Copy link
Author

Thank you for the hint @eernstg. I have replaced your reification approach with phantom types + extension-based members and now this is pretty much what I was looking for.

// This uses the retired inline classes experiment instead of extension types because inline classes are available on DartPad.
void main() {
  const Store(
    [Store.aTag, Store.bTag, Store.cTag, Store.cTag],
    [
      // A at 0 points to 1.
      [1],
      // B at 1 points to 2.
      [2],
      // C at 2 points to nothing.
      [],
      // C at 3 points to nothing.
      [],
    ],
  ).use(
    (final store) {
      String rootToString(
        final Node<RootBrand> n,
      ) {
        return n.match(
          a: (final n) => "A",
          b: (final n) => "B",
          c: (final n) => "C",
        );
      }

      for (final node in store.all()) {
        print(node.match(
          a: (final n) => [rootToString(n), "->", rootToString(n.b)],
          b: (final n) => [rootToString(n), "->", rootToString(n.c)],
          c: (final n) => [rootToString(n), "with no children"],
        ).join(" "));
      }
    },
  );
}

class Store {
  static Store? __storeInstance;
  static Store get _storeInstance => __storeInstance!;
  static const int aTag = 0;
  static const int bTag = 1;
  static const int cTag = 2;
  final List<int> type;
  final List<List<int>> children;

  const Store(
    this.type,
    this.children,
  );

  void use(
    final void Function(Store store) fn,
  ) {
    __storeInstance = this;
    fn(this);
    __storeInstance = null;
  }
  
  Node<RootBrand> _child(
    final int nodeId,
  ) {
    return _read(children[nodeId].single);
  }

  Node<RootBrand> _read(
    final int nodeId,
  ) {
    switch (type[nodeId]) {
      case aTag:
        return Node<ABrand>._(nodeId);
      case bTag:
        return Node<BBrand>._(nodeId);
      case cTag:
        return Node<CBrand>._(nodeId);
      default:
        throw Exception("Unknown Tag");
    }
  }

  Iterable<Node<RootBrand>> all() sync* {
    for (int i = 0; i < type.length; i++) {
      yield _read(i);
    }
  }
}

inline class Node<X extends RootBrand> {
  final int id;

  const Node._(this.id);
}

abstract class RootBrand {}

extension RootBrandExtension<X extends RootBrand> on Node<X> {
  R match<R>({
    required R Function(Node<ABrand>) a,
    required R Function(Node<BBrand>) b,
    required R Function(Node<CBrand>) c,
  }) {
    switch (Store._storeInstance.type[id]) {
      case Store.aTag:
        return a(this as Node<ABrand>);
      case Store.bTag:
        return b(this as Node<BBrand>);
      case Store.cTag:
        return c(this as Node<CBrand>);
      default:
        throw Exception("Unknown Tag.");
    }
  }
}

abstract class ABrand implements RootBrand {}

extension ABrandExtension<X extends ABrand> on Node<X> {
  Node<BBrand> get b => Store._storeInstance._child(id) as Node<BBrand>;
}

abstract class BBrand implements RootBrand {}

extension BBrandExtension<X extends BBrand> on Node<X> {
  Node<CBrand> get c => Store._storeInstance._child(id) as Node<CBrand>;
}

abstract class CBrand implements RootBrand {}

extension CBrandExtension<X extends CBrand> on Node<X> {
  // Empty.
}

@eernstg
Copy link
Member

eernstg commented Aug 19, 2023

That's a very interesting technique, @modulovalue! I've simplified a few things (during the time where I was looking at the code in order to understand what was going on ;-). In particular, I've removed the typing of nodes as Node<XBrand> for some X in some locations where that type is immediately eliminated again by an upcast.

void main() {
  const Store(
    [Store.aTag, Store.bTag, Store.cTag, Store.cTag],
    [
      // A at 0 points to 1.
      [1],
      // B at 1 points to 2.
      [2],
      // C at 2 points to nothing.
      [],
      // C at 3 points to nothing.
      [],
    ],
  ).use(
    (final store) {
      String rootToString(final Node<RootBrand> n) => n.match(
            a: (final n) => "A",
            b: (final n) => "B",
            c: (final n) => "C",
          );

      for (final node in store.all) {
        print(node
            .match(
              a: (final n) => [rootToString(n), "->", rootToString(n.b)],
              b: (final n) => [rootToString(n), "->", rootToString(n.c)],
              c: (final n) => [rootToString(n), "with no children"],
            )
            .join(" "));
      }
    },
  );
}

class Store {
  static Store? __storeInstance;
  static Store get _storeInstance => __storeInstance!;
  static const int aTag = 0;
  static const int bTag = 1;
  static const int cTag = 2;
  final List<int> type;
  final List<List<int>> children;

  const Store(this.type, this.children);

  int _check(nodeId) {
    var tag = type[nodeId];
    if (tag != aTag && tag != bTag && tag != cTag) {
      throw Exception("Unknown Tag");
    }
    return nodeId;
  }

  Node<RootBrand> _child(final int nodeId) => _read(children[nodeId].single);
  Node<RootBrand> _read(final int nodeId) => Node<RootBrand>._(_check(nodeId));

  void use(final void Function(Store store) fn) {
    __storeInstance = this;
    fn(this);
    __storeInstance = null;
  }

  Iterable<Node<RootBrand>> get all =>
      [for (int i = 0; i < type.length; ++i) _read(i)];
}

extension type const Node<X extends RootBrand>._(int id) {
  R match<R>({
    required R Function(Node<ABrand>) a,
    required R Function(Node<BBrand>) b,
    required R Function(Node<CBrand>) c,
  }) => switch (Store._storeInstance.type[id]) {
    Store.aTag => a(asA),
    Store.bTag => b(asB),
    Store.cTag => c(asC),
    _ => throw Exception("Unknown Tag."),
  };

  Node<ABrand> get asA => Node<ABrand>._(id);
  Node<BBrand> get asB => Node<BBrand>._(id);
  Node<CBrand> get asC => Node<CBrand>._(id);
}

abstract class RootBrand {}

abstract class ABrand implements RootBrand {}

extension ABrandExtension<X extends ABrand> on Node<X> {
  Node<BBrand> get b => Store._storeInstance._child(id).asB;
}

abstract class BBrand implements RootBrand {}

extension BBrandExtension<X extends BBrand> on Node<X> {
  Node<CBrand> get c => Store._storeInstance._child(id).asC;
}

abstract class CBrand implements RootBrand {}

extension CBrandExtension<X extends CBrand> on Node<X> {
  // Empty
}

Here is a summary of my interpretation of this technique:

  • The technique makes it possible to work on a representation type (here: an int id).
  • We look up the intended "type" of that id in a database (Store.type[id]), which is a tag (aTag, bTag, cTag), and then work on id using the extension type Node<T> where T is ABrand when the tag is aTag, etc.
  • Extension methods on Node<ABrand>, ..., Node<CBrand> provide the actual behavior.

So the crucial step is that the underlying int is equipped with methods by being typed statically as a Node<T> where the T is chosen by looking up the type tag of that id in the store.

This is a from-scratch emulation of OO dispatch. Cool! ;-) It allows for polymorphism (we may see in the store that the child of an A is at a specific index id2, but the type tag of id2 could be BSub which is a subtype of B). It also allows for surprising things like assigning a new type to an existing object (so it could be used to model objects that change type over time, e.g., as they adopt or drop roles, or as they change state.

Of course, the type soundness of the heap is not ensured by any static analysis. For example, id2 could have a type tag of A even though it's supposed to be the child of an A, and that should be a B (or a subtype of B, which does not include A as declared).

So we can work on a (potentially large) store of objects, and each of them is just represented by an index into the Store, and references from one to the other "object" in that store, and then we glue the proper type and the proper method implementations to the "object" based on the type tag whenever needed.

I think it's useful to be aware of the fact that such a thing is possible, and it's definitely cool that OO dispatch can be emulated statically in a way that makes emulated method invocations look just like real ones. ;-)

@modulovalue
Copy link
Author

@eernstg Thank you for the summary. I agree with what you said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants