Skip to content

Discussion of initial generics additions in 3c rewriter #631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
8 tasks done
kyleheadley opened this issue Jun 24, 2021 · 3 comments
Closed
8 tasks done

Discussion of initial generics additions in 3c rewriter #631

kyleheadley opened this issue Jun 24, 2021 · 3 comments

Comments

@kyleheadley
Copy link
Member

kyleheadley commented Jun 24, 2021

This issue is a more in depth tracker for 2b of #358. Branch add_generics is loosely tracking code changes, and has a draft with early discussion #608.

Initial work was intended to add generics in cases like:

extern void *malloc(unsigned long);
int wrap_malloc(void *p, unsigned long s) {
  p = malloc(s);
  return p != (void*)0;
}

So that the prototype would be rewritten as: _For_any(T) int wrap_malloc(_Ptr<T> p, unsigned long s)
Unfortunately, we quickly realized that the unsafe assignment to p means that it's actually rewritten T *p if at all.
We decided that we weren't ready to mix generics with wildness, so the rewrite now only happens when pointers are checked, and the example above doesn't rewrite at all. We can rewrite simpler things like:
void viewer(void *i) { return; } -> _For_any(T) void viewer(_Ptr<T> i) { return; }
void *getNull() { return 0; } -> _For_any(T) _Ptr<T> getNull(void) { return 0; }

This code still has some bugs when dealing with more complex situations. So in order to accept a PR, we need to deal with a few issues, either fixing the problem or adjusting regression tests to ignore the problem until later. Here is a list of known issues followed by their explanations:

  • allow wild generics
    • later, if ever
  • allow itype generics
    • later, if ever
  • function pointer definition
    • later
  • function pointer assignment
    • mark all void* as wild for now (code already handles this)
  • callsite type arguments still void
    • use a generic param if called with an identifier (not an arbitrary expression)
  • safe casts
    • make exceptions for generics and malloc, etc., constrain other void* to wild
  • newly required bounds annotations
    • later, file issue
  • type parameter propagation inside function bodies
    • later

The function pointer issues are related of course. Fp definition would be similar to ordinary function definition, but I haven't yet researched how to rewrite This is something we could easily save for later, as code should be rare. FP assignment done right would require constraints, probably new ones, created for the assignment and solved. Otherwise, we can constrain void* parameters to wild (as we may do now) and not rewrite functions that interact with fps. Our test test/globalschecked.c has this code that needs to be dealt with[edit: this was a bug that removed the wildness constraint. The example is, correctly for now, not rewritten]:

int foo(void *x) {
  int *p = (int *)x;
  return *p;
}

static int (*F)(void *) = foo;

The type argument adder is good, but it can't currently handle rewritten generics. The issue hasn't been looked into further than this. We see an example in our test/hash.c test which has this code after conversion:

...
_For_any(T) void hash_free_entry(_Ptr<struct hash> p_hash, _Array_ptr<T> p_key)
{
  _Ptr<struct hash_node> p_node = hash_get_node_by_key<void>(p_hash, p_key);
...
_For_any(T) _Ptr<struct hash_node> hash_get_node_by_key(_Ptr<struct hash> p_hash, _Array_ptr<T> p_key)
...

both the functions are rewritten as generic, but the type param is void rather than T. A complication is the order of code here. Functions are marked as potentially generic, than the type arguments are determined, than after constraint propagation, functions with wild parameters are not rewritten as generic. We need to have the T looked up at least and have constraints at most. If we leave this unfixed, it may be fine, as the functions were originally called with void parameters anyway. Testing will need to determine if checkedc allows this and if it is still safe or we need to constrain such instances as wild.

Casting needs to be adjusted. Currently, 3c allows casting from void* to other types (possibly with known circumstances), to facilitate common memory manipulation code like malloc. We would like to discontinue this practice and use generics properly instead. Unfortunately, it is very common to cast the return value of such functions during assignment to variables. The cast is not generic-aware, and breaks tests like test/basic.c that contains int *ptr = (int *)malloc(n * sizeof(int)); The code needs to be updated with generic awareness before disallowing casts from void*.

Bounds annotations are not required on void * params, because they are unsafe anyway. Once we add generics and mark some of them as safe, they will fail clang's bounds requirement. Such is the case in test/liberal_itypes_ptr.c where the following code converts correctly (mostly as _Array_ptr<T>) but the call can't satisfy the existing bounds:

void bounds_fn(void *b : byte_count(1));
void bounds_call(void *p) {
  bounds_fn(p);
}

Since this is an -alltypes problem, it might be ok to leave it, since we don't require compilation with alltypes yet.

Finally, we have generics propagation inside expressions. The hope is that we can use our existing constraint propagation code to propagate the index of a generic type, which corresponds to the params in _For_any(T,U), with T as 0 and U as 1. All variables declared void would get an index and when they interact we constrain them equal to each other with the lowest value. This would likely be a new kind of constraint and atom. And we have to be careful not to conflate the T of one generic function with that of another, even when we're constraining them to be of the same type for example a callsite might have

_For_any(T) sample1(_Ptr<T> t) {...}
_For_any(T,U) sample2(_Ptr<T> t, _Ptr<U> u) {
  ...
  sample1<U>(u);
}

This is called "unification" and may need a more complex algorithm to propagate properly, or we can keep it simple and define what 3c can and can't do. Without any generics propagation 3c will be limited to the trivial examples of the current capability shown above.

The goal in writing this is to determine which of these we need to add to the inital PR, which ones can be follow-ups, and which we need to delay support for. Comments and discussion are welcome, especially on what parts of this would be helpful in the current porting samples.

@mwhicks1
Copy link
Member

Some comments:

The cast is not generic-aware, and breaks tests like test/basic.c that contains int *ptr = (int )malloc(n * sizeof(int)); The code needs to be updated with generic awareness before disallowing casts from void.

For this code, we should be rewriting the cast. That is, when we have int *p = (int *)malloc(sizeof(int)); it should already be the case that this gets rewritten to _Ptr<int> p = (_Ptr<int>)malloc<int>(sizeof(int)); That is, the cast is updated, and the instantiation is made. Perhaps your code has broken this? Note that I don't see how the situation you've given relates to generic inference, though, as opposed to instantiation. If you had something like void **p = (void **)malloc(sizeof(void *)) you could not rewrite this as _Ptr<T> = (_Ptr<T>)malloc(sizeof(T)) I don't think, since we don't actually know how big T is?

Once we add generics and mark some of them as safe, they will fail clang's bounds requirement.

I think it's OK to punt this one for now, and file an issue to fix later.

Finally, we have generics propagation inside expressions.

I don't follow this issue. The goal here is never to have more than one generic parameter in a function. So how would we end up with Forall(T,U), which has two? Maybe you are thinking of the case that you want to infer a void * to be a generic for a function that already has another generic? That's another one I would punt -- just don't attempt to infer generics for these.

@kyleheadley
Copy link
Member Author

For this code, we should be rewriting the cast. That is, when we have int *p = (int *)malloc(sizeof(int)); it should already be the case that this gets rewritten to _Ptr<int> p = (_Ptr<int>)malloc<int>(sizeof(int));

This is correct, and it is also correct that my update broke it. We decided that casting from void* was unsafe, so I made the minimal change to implement that. That code doesn't understand that malloc is generic and doesn't need a cast at all, instead marking it as wild because malloc returns void* which is cast to int * "unsafely".

I assume from the reaction that this is something we need to fix in the initial PR. The other option is to continue to allow casts from void* for the time being. That would cause code like:

static void one_process_start(void* p_arg) {
  struct vsf_session* p_sess = (struct vsf_session*) p_arg;
    …
  }
}

to be converted to generics incorrectly. I assume this is not acceptable.

I'll start working on allowing malloc code to rewrite/compile while disallowing other casts from void*

@kyleheadley
Copy link
Member Author

As checked in the opening post, this discussion has served its purpose and generated our actions for each side issue. Function pointers, bounds annotations, and generic index propagation will get their own issues got the future. I'm closing this with the opened PR #639.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants