Skip to content

Check return value bounds#1150

Merged
kkjeer merged 23 commits into
masterfrom
check-return-bounds
Aug 27, 2021
Merged

Check return value bounds#1150
kkjeer merged 23 commits into
masterfrom
check-return-bounds

Conversation

@kkjeer

@kkjeer kkjeer commented Aug 10, 2021

Copy link
Copy Markdown
Contributor

This PR checks that the inferred bounds of a return value expression imply the declared return bounds (if any) for the enclosing function.

This PR does not check modifications to variables or other lvalue expressions that are used in return bounds. For example:

_Array_ptr<int> f(int size) : count(size) {
  size = 3; // modify the size variable used in the declared return bounds
  return 0;
}

At the assignment size = 3, the bounds checker will not emit any diagnostic messages even though the modified variable size is used in the declared return bounds. This can be done in a separate PR.

Test updates:

  1. Disable two 3C tests: functionDeclEnd.c and itype_nt_arr_cast due to functions returning expressions with invalid bounds (see issue 3C test failures when checking return bounds #1147).
  2. CheckedC/dump-dataflow-facts.c: two functions returned expressions with invalid bounds. These functions now return 0.
  3. CheckedC/static-checking/bounds-decl-checking.c: four functions returned expressions with invalid bounds. These functions now return expressions with valid bounds.
  4. CheckedC/static-checking/return-bounds.c: add a new test file that tests the return bounds checking behavior introduced in this PR.
  5. checkedc/465: add expected errors to thirteen function in checkedc tests that return expressions with invalid bounds.
  6. checkedc-llvm-test-suite/114: update one function in the LLVM test suite that returned an expression with unknown bounds.

kakje added 16 commits August 6, 2021 15:59
…ons, and set the ReturnBounds member to the expanded return bounds
…FAIL

See checkedc-clang issue 1147. These tests should be updated to avoid the compiler errors that arise due to functions in these test files return expressions with unknown bounds. Once these test files are updated, they should no longer be marked as XFAIL.
… using the set of expressions that produce the same value as RetExpr
FD->getReturnType(),
BoundsValueExpr::Kind::Return);
ReturnBounds =
BoundsUtil::ExpandToRange(S, ReturnVal, FD->getBoundsExpr());

@sulekhark sulekhark Aug 12, 2021

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The function ExpandToRange only expands count (or, byte_count) bounds expressions. Does this mean that in a range bounds expression like bounds(_Return_value, _Return_value + 2), the abstract place holder _Return_value will not be replaced by the concrete value ReturnVal?

  2. It will be great if a positive test case (no errors or warnings) with similar return bounds as in the test case f10 in clang/test/CheckedC/static-checking/return-bounds.c is added.

@kkjeer kkjeer Aug 12, 2021

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReturnVal is a _Return_value expression (a BoundsValueExpr with the Kind Return). If a function has declared bounds of count(size), ExpandToRange will return bounds(_Return_value, _Return_value + size). If a function has declared bounds of bounds(_Return_value, _Return_value + size), ExpandToRange will return bounds(_Return_value, _Return_value + size).

I've added a test case for this (function f7 with no errors or warnings) to return-bounds.c.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the clarification.

@sulekhark sulekhark left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@mattmccutchen-cci

Copy link
Copy Markdown
Member

Is it intentional that this change applies to return values with bounds-safe interfaces? Consider the following example:

_Array_ptr<char> test(void) : count(1) {
  char *x;
  return x;
}

char *test_itype(void) : count(1) {
  char *x;
  return x;
}

Both functions compile without error on master. With this PR, both functions generate errors:

itype_return.c:3:10: error: return value has unknown bounds, bounds expected because the function 'test' has bounds
  return x;
  ~~~~~~~^
itype_return.c:1:18: note: (expanded) declared return bounds are 'bounds(_Return_value, _Return_value + 1)'
_Array_ptr<char> test(void) : count(1) {
                 ^
itype_return.c:8:10: error: return value has unknown bounds, bounds expected because the function 'test_itype' has bounds
  return x;
  ~~~~~~~^
itype_return.c:6:7: note: (expanded) declared return bounds are 'bounds(_Return_value, _Return_value + 1)'
char *test_itype(void) : count(1) {
      ^

For test, this makes sense: the lack of bounds checking at a function return was inconsistent with all other contexts, which IIUC is the bug being fixed in this PR. However, for test_itype, the error seems to violate this statement from section 6.3.8 of the specification:

The checking of return statements (Section 4.9) includes return statements where there is a return bounds-safe interface for the enclosing function:

  • In checked contexts.
  • In unchecked contexts, when the return statement expression has been converted implicitly to an unchecked pointer type.

Since x already has an unchecked pointer type, it does not get an implicit conversion inserted per section 6.3.7, so the return statement does not meet the condition to be included in bounds checking.

3C has been assuming that a function whose return type has a bounds-safe interface can return an arbitrary unchecked pointer, so if that design is changing, we'll need to find a way to adjust the design of 3C correspondingly. That is, in essence, why the itype_nt_arr_cast.c test is failing. In functionDeclEnd.c, bounds checking of return values is not the focus of the test, so if need be, we can probably find a way to change that test to avoid triggering the new error.

@kkjeer

kkjeer commented Aug 12, 2021

Copy link
Copy Markdown
Contributor Author

The behavior in this PR (regarding bounds checking unchecked pointers in unchecked contexts with bounds-safe interfaces) is intentional with respect to the latest Checked C discussion as far as I understand. @dtarditi @sulekhark should we change this behavior so it is more consistent with the 3C design?

@mattmccutchen-cci

Copy link
Copy Markdown
Member

Perhaps more to the point: if I'm right that the "latest Checked C discussion" as reflected by this PR is no longer consistent with the specification, can you please update the specification or otherwise point us to where we can get the latest information?

@sulekhark

sulekhark commented Aug 12, 2021

Copy link
Copy Markdown
Contributor

Perhaps more to the point: if I'm right that the "latest Checked C discussion" as reflected by this PR is no longer consistent with the specification, can you please update the specification or otherwise point us to where we can get the latest information?

@mattmccutchen-cci
As you pointed out, Section 6.3.8 (middle of page 116)

The checking of return statements (Section 4.9) includes return statements where there is return bounds-safe interface for the enclosing function:
In checked contexts,
In unchecked contexts, when the return statement expression has been converted implicitly to an unchecked pointer type

While our current implementation is consistent with Section 4.9, I think this part of the spec (Section 6.3.8 - middle of page 116) is unclear and we need to modify the spec and add one more scenario to the above statement as shown below. This will make this part of the spec consistent with the previous part.

In unchecked contexts, when an unchecked pointer is returned

However, we will discuss this matter once again with David tomorrow, and resolve it (i.e. make the spec and the implementation consistent with each other).

@mattmccutchen-cci

Copy link
Copy Markdown
Member

I guess I'm still unclear on the overall design intent of bounds-safe interfaces. Clearly, one important use case is to allow both unmodified C code and fully Checked C code to call external library functions that have bounds-safe interfaces (as in the Checked C system headers). Is it also a goal to be able to add bounds-safe interfaces to plain C functions in the user's code without having to modify those implementations right away? This is essentially what 3C is trying to do.

If the answer is indeed "no", I noticed two other parts of the spec that could use clarification. In Section 6.3.1:

A function that has a bounds-safe interface and whose body does not use checked pointer or array types is compiled as though the bounds-safe interface has been stripped from its source code.

That sentence appears in a discussion of what happens when unchecked code uses a program element with a bounds-safe interface. But read in isolation, it does seem to suggest that my test_itype example above should be allowed.

Also, an assignment like the following is already rejected by the compiler before this PR (similar to the restriction this PR is adding for return values):

char *global : count(1);

void assign_global(void) {
  char *x;
  global = x;
}

But Section 6.3.8 says:

The checking that an expression statement, declaration, or bundled block implies the declared bounds of variables includes:

  • In checked contexts, all variables with bounds-safe interfaces.
  • In unchecked contexts, all variables with bounds-safe interfaces that are modified by an assignment within the statement, declaration, or bundled block, where the right-hand side expression is converted implicitly from a checked pointer type to an unchecked pointer type.

Does another scenario for an unchecked right-hand side need to be added here as well?

@mwhicks1

Copy link
Copy Markdown
Contributor

We've been discussing this issue over at Correct Computation and the summary is:

  • We are struggling to understand the goals and principles underlying the rules for itypes, especially when used on function definitions (rather than prototypes). If we understood these principles and goals, we might better understand how to judge one design vs. an alternative.

One thing I find confusing is that itypes currently used for standard library functions could not be applied to those functions themselves, in their original C. For example, in the header string_checked.h I have

char *strtok(char * restrict s1 : itype(restrict _Nt_array_ptr<char>),
             const char * restrict s2 : itype(restrict _Nt_array_ptr<const char>)) :
  itype(_Nt_array_ptr<char>);

The standard library was compiled from C code that implements this function, e.g.,

char *strtok(char * restrict s, const char * restrict sep) {
 {
	static char *p;
	if (!s && !(s = p)) return NULL;
	s += strspn(s, sep);
	if (!*s) return p = 0;
	p = s + strcspn(s, sep);
	if (*p) *p++ = 0;
	else p = 0;
	return s;
} 

I would have thought that if I can advertise in string_checked.h that this code as an itype, I could just as well have attached the itype directly to the code itself, i.e.,

char *strtok(char * restrict s : itype(restrict _Nt_array_ptr<char>),
             const char * restrict sep : itype(restrict _Nt_array_ptr<const char>)) :
  itype(_Nt_array_ptr<char>) {
  	static char *p;
	if (!s && !(s = p)) return NULL;
	s += strspn(s, sep);
	if (!*s) return p = 0;
	p = s + strcspn(s, sep);
	if (*p) *p++ = 0;
	else p = 0;
	return s;
}

It feels strange to me that this does not work, but typechecking this code with its original C type and the prototype also present with an itype does work. What principle justifies the difference in treatment?

@sulekhark

sulekhark commented Aug 14, 2021

Copy link
Copy Markdown
Contributor

As per the discussion we had with David today, we agree that if the itypes used in a function declaration are applied to the function definition, and the function definition is compiled in an unchecked scope, and the function body does not use any checked pointers, then the compiler must not issue any bounds-checking errors or warnings.

For the strtok code show below, compiling it with the master branch :

#include <string.h>
char *strtok(char * restrict s : itype(restrict _Nt_array_ptr<char>),
             const char * restrict sep : itype(restrict _Nt_array_ptr<const char>)) :
  itype(_Nt_array_ptr<char>) {
  	static char *p;
	if (!s && !(s = p)) return NULL;
	s += strspn(s, sep);
	if (!*s) return p = 0;
	p = s + strcspn(s, sep);
	if (*p) *p++ = 0;
	else p = 0;
	return s;
}

currently gives two errors (I have elided the notes):

 t.c:7:14: error: inferred bounds for 's' are unknown after assignment
        if (!s && !(s = p)) return 0;

t.c:8:2: error: inferred bounds for 's' are unknown after assignment
        s += strspn(s, sep);

Going forward, these errors will be issued when the above function is analyzed in checked scope but not when analyzed in an unchecked scope. Additionally, in unchecked scope the return bounds checking (as per this PR) on the returned pointer s in the above example will not be performed.

We will update the Checked C spec and the compiler code to be consistent with this behavior.

@mwhicks1

Copy link
Copy Markdown
Contributor

Thanks for the update. When the new spec is available we will reread with a mindset toward consistency/principle, to see if we have any further thoughts about itypes; if so, we'll comment on a separate thread.

@mattmccutchen-cci

Copy link
Copy Markdown
Member

we agree that if the itypes used in a function declaration are applied to the function definition, and the function definition is compiled in an unchecked scope, and the function body does not use any checked pointers, then the compiler must not issue any bounds-checking errors or warnings

For the record: it looks like #1157, #1158, and #1159 have been filed for this work. Thanks!

@sulekhark sulekhark left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

The purpose of this function was to test function declaration rewriting
for itype array pointers declared with bounds but without an explicit
itype. This is equally well tested when a correct bound is inferred for
the parameter array.

(Matt: Remove the XFAIL now that the test is fixed.)

(cherry picked from commit 57d31d2)

@mgrang mgrang left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants