Skip to content

Avoid undefined C behavior for array_ptrs, integer arithmetic, and bounds checks #168

@dtarditi

Description

@dtarditi

The LLVM IR definition allows code generators to control whether various situations of undefined behavior arise. For example, in C, signed integer overflow can lead to undefined behavior. With LLVM IR, clang code generation attaches attributes that specify various kinds of undefined C behavior.

We've carefully defined the Checked C extension to eliminate or restrict undefined behavior that can cause problems. We need to make sure that we modify clang code generation to avoid undefined behavior that Checked C either restricts or does not allow. Areas that I know about include:

  • Checked C allows out-of-bounds pointer arithmetic for array_ptrs and does not consider it to be undefined behavior.
  • Checked C does not consider signed integer overflow undefined behavior. Rather, it must map to some integer deterministically.
  • Checked C does not allow addition of null pointers with integers.
  • Checked requires checking pointer arithmetic for array_ptrs for overflow.

The LLVM optimizer takes advantage of undefined behavior in ways that cause problems from a security perspective. If the LLVM optimizer can prove that undefined behavior occurs, it is allowed to do things such as delete code that relies on the undefined behavior. It can make sense to allow an optimizer to ignore undefined behavior (for example, assume that integer overflow never occurs). It is another thing to allow an optimizer to delete code because it proved that integer overflow occurred. This can lead to safety-checking code being deleted.

Checked C will insert bounds checks and null checks before all memory accesses via pointers derived from array_ptr, so the LLVM optimizer should never "see" incorrect memory accesses from array_ptr pointers that lead to undefined behavior. Checked C also requires overflow checking for array_ptr pointer arithmetic.

We still have to defend against subtle problems that can arise from specifying behavior to be undefined:

  1. Pointer arithmetic that goes out of bounds, leading the result to be undefined, causing a bounds check to be deleted.
  2. A signed integer that is guaranteed to overflow being added to pointer that is later bounds checked, tainting the resulting pointer as "undefined". This could lead to the bounds check being eliminated.
  3. An unchecked pointer being provably fetched out-of-bounds for an array, causing downstream bounds checks to be deleted. The bounds checks might defend against this bogus value being dereferenced.
  4. An unchecked pointer going provably out-of-bounds, causing a downstream bounds check to be delete. This bounds check might defend against this bogus value being dereferenced.

Undefined C behavior is dangerous from a security perspective. See this blog item from Chris Lattner to see what can go wrong, and this thread involving Linus Torvalds giving his perspective on using undefined behavior in the compiler.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureThis labels new features and enhancements.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions