Skip to content

Extend clang IR with bounds expressions and parse bounds expressions for parameters. #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 30, 2016

Conversation

dtarditi
Copy link
Member

This change extends the clang IR to represent Checked C bounds expressions and optional bounds expressions for variable declarations. It also adds support for parsing bounds expressions and modifies parsing of function parameter lists to parse optional bounds expressions.

Bounds expressions are represented in the IR by adding a new abstract class BoundsExpr and subclassing it for count bounds expressions (count(e1) and byte_count(e1)), range bounds expressions (bounds(e1, e2)), and nullary bounds expressions(bounds(none)). AST printing, serialization, traversal, and tree transformations are extended handle the new expressions.

Bounds expressions are attached to variable declarations by adding an additional member to VarDecls. Many VarDecls will not have bounds expressions, so this adds extra space overhead to the representation of VarDecls. We can revisit this later if it becomes an issue.

To test the new bounds expressions, we add parsing of bounds expressions for function parameter lists and attach the parsed bounds expressions to the VarDecls for the parameters.

Bounds expressions for parameters need to be processed in a scope with all the parameters available. They are currently being processed in a scope that contains the parameters seen so far. This is a little complicated to implement in clang. You have to delay parsing of the bounds expressions. I will come back to this after getting basic parsing of bounds expressions working. I've opened issue #7 to track this.

Testing:

  • This passes the current test baseline for this snapshot of clang:
  • Wrote new feature tests of parsing of parameters with bounds declarations. There will be a separate pull request to the Github CheckedC repo for these tests.
  • Passes the existing Checked C tests.
  Expected Passes    : 8942
  Expected Failures  : 21
  Unsupported Tests  : 206
  Unexpected Failures: 3

dtarditi added 8 commits June 20, 2016 17:31
Fix a few typos in Setup-and-Build.md. Add a missing
hyperlink for how to update to newer versions of the
LLVM/clang sources.
Checkpoint the work before refining the BoundsExpr class into multiple
subclasses (the style used in the clang AST).  The code compiles, but
no instances of the class are allocated.   Code also needs to be added to
Sema.h and SemaExpr.cpp to allocate instanceo of the class.
Instead of just having BoundsExpr, create subclasses for each type of
BoundsExpr.   There are three subclasses: CountBoundsExpr, NullaryBoundsExpr,
and RangeBoundsExpr.

Testing: as expected, passes current clang test baseline.  The new classes are
not used yet.
Add code for parsing bounds expressions.   The code compiles, but has not
been tested yet.
This change adds parsing of bounds declarations for parameters.
- It improve the error handling and recovery in the prior code for parsing
  of bounds expressions.
- It removes parsing of the bounds(any) bounds expression, which we plan to
  remove from the source language level in the Checked C specification.
- It adds the basic semantic actions for constructing bounds expressions.
  There is no type checking in those semantic actions; that  will be added
  later.
- Another issue is that bounds expressions need to be processed in a scope
  with all the parameters available.  They are currently being processed in
  a scope that contains the parameters seen so far.  This is a little
  complicated to implement in clang.  You have to delay parsing of the bounds
  expressions. I will come back to this after getting basic parsing of
  bounds expressios working.
This change fixes coding convention issues, cleans up some code, and fills in
missing function bodies in TreeTransform.h.
* We want to stick to coding conventions that are used in the files that we are
  working in:
- Rename the property getters for subexpressions in CountBoundsExpr and
  RangeBoundsExpr to be suffixed with Expr: getCount -> getCountExpr,
  getLower -> getLowerExpr, and getUpper -> getUpperExpr.
- Capitalize the first letters of arguments.
- Delete some unnecessary white space.
- Fix some line ending issues for DiagnosticParserKinds.td.
* Remove support code for treating "any" as a contextual keyword. bounds(any)
will not be allowed at the source level, so there is no need for this code.
* A few function bodies in TreeTransform.h were stubbed out. Fill them in now
 that we have code for constructing the bounds expressions.
* Add property getters and setters for the optional bounds expression.
* Modify printing of VarDecls to include the bounds expression.
@msftclas
Copy link

Hi @dtarditi, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!


It looks like you're a Microsoft contributor (David Tarditi). If you're full-time, we DON'T require a Contribution License Agreement. If you are a vendor, please DO sign the electronic Contribution License Agreement. It will take 2 minutes and there's no faxing! https://cla.microsoft.com.

TTYL, MSBOT;

/// \brief - Return true if this variable has bounds declared for it.
bool hasBoundsExpr() const;

const Expr *getBoundsExpr() const {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a header comment too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment


if (Tok.getKind() == tok::identifier) {
IdentifierInfo *NextIdent = Tok.getIdentifierInfo();
if (NextIdent == Ident_none) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none is a reserved keyword inside bounds expressions, right? So we don't have to worry about a range-based bounds expression referring to a variable named 'none'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None is a contextual keyword. You can't write bounds(none, y). But you could write bounds(y, none), and none will be treated as a variable name.

@reubeno
Copy link

reubeno commented Jun 29, 2016

I finished a first pass over the pull request; please pardon any naive questions that are more about clang and less about this change--but I'm very eager to learn the answers to those questions if you have the time to answer them :)

@dtarditi
Copy link
Member Author

dtarditi commented Jun 30, 2016

I've addressed your current round of feedback on the pull request. I've pushed a new commit to the branch in my personal repo that addresses the code changes.

One open issue is whether we want to avoid using the cast operator. As far as I can tell, it only asserts in debug builds. We could use the dyn_cast operator instead, which returns null if the cast fails. I'll look at existing code some more to see what it is doing and whether one still is favored or not.

Retype some code to avoid possibly unsafe cast operations.  The retyping
prevents misuse of the ActOnBoundsExpr and ActOnInvalidBoundsExpr methods.
if (Bounds.isInvalid()) {
SkipUntil(tok::comma, tok::r_paren, StopAtSemi | StopBeforeMatch);
Actions.ActOnInvalidBoundsExpr(Param);
} else Actions.ActOnBoundsExpr(Param, cast<BoundsExpr>(Bounds.get()));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: IMHO, this would be more readable if the body of the else were at least on its own line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I looked at the code and else statements have on the same line either nothing, {, or an if-control statement:

else
   ...

or

else { 
  ...
 }

or

else if (cond) 

@reubeno
Copy link

reubeno commented Jun 30, 2016

Changes look good to me; I made a couple more style comments, but apart from those I'd say we're good to go with this :)

@dtarditi dtarditi merged commit 6d4d831 into checkedc:master Jun 30, 2016
@dtarditi
Copy link
Member Author

Thanks! I updated the formatting and completed the pull request.

@dtarditi dtarditi deleted the bounds-exprs branch July 9, 2016 00:10
mgrang pushed a commit that referenced this pull request Feb 18, 2020
Introduction
============

This patch added intial support for bpf program compile once
and run everywhere (CO-RE).

The main motivation is for bpf program which depends on
kernel headers which may vary between different kernel versions.
The initial discussion can be found at https://lwn.net/Articles/773198/.

Currently, bpf program accesses kernel internal data structure
through bpf_probe_read() helper. The idea is to capture the
kernel data structure to be accessed through bpf_probe_read()
and relocate them on different kernel versions.

On each host, right before bpf program load, the bpfloader
will look at the types of the native linux through vmlinux BTF,
calculates proper access offset and patch the instruction.

To accommodate this, three intrinsic functions
   preserve_{array,union,struct}_access_index
are introduced which in clang will preserve the base pointer,
struct/union/array access_index and struct/union debuginfo type
information. Later, bpf IR pass can reconstruct the whole gep
access chains without looking at gep itself.

This patch did the following:
  . An IR pass is added to convert preserve_*_access_index to
    global variable who name encodes the getelementptr
    access pattern. The global variable has metadata
    attached to describe the corresponding struct/union
    debuginfo type.
  . An SimplifyPatchable MachineInstruction pass is added
    to remove unnecessary loads.
  . The BTF output pass is enhanced to generate relocation
    records located in .BTF.ext section.

Typical CO-RE also needs support of global variables which can
be assigned to different values to different hosts. For example,
kernel version can be used to guard different versions of codes.
This patch added the support for patchable externals as well.

Example
=======

The following is an example.

  struct pt_regs {
    long arg1;
    long arg2;
  };
  struct sk_buff {
    int i;
    struct net_device *dev;
  };

  #define _(x) (__builtin_preserve_access_index(x))
  static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) =
          (void *) 4;
  extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version;
  int bpf_prog(struct pt_regs *ctx) {
    struct net_device *dev = 0;

    // ctx->arg* does not need bpf_probe_read
    if (__kernel_version >= 41608)
      bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev));
    else
      bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev));
    return dev != 0;
  }

In the above, we want to translate the third argument of
bpf_probe_read() as relocations.

  -bash-4.4$ clang -target bpf -O2 -g -S trace.c

The compiler will generate two new subsections in .BTF.ext,
OffsetReloc and ExternReloc.
OffsetReloc is to record the structure member offset operations,
and ExternalReloc is to record the external globals where
only u8, u16, u32 and u64 are supported.

   BPFOffsetReloc Size
   struct SecLOffsetReloc for ELF section #1
   A number of struct BPFOffsetReloc for ELF section #1
   struct SecOffsetReloc for ELF section #2
   A number of struct BPFOffsetReloc for ELF section #2
   ...
   BPFExternReloc Size
   struct SecExternReloc for ELF section #1
   A number of struct BPFExternReloc for ELF section #1
   struct SecExternReloc for ELF section #2
   A number of struct BPFExternReloc for ELF section #2

  struct BPFOffsetReloc {
    uint32_t InsnOffset;    ///< Byte offset in this section
    uint32_t TypeID;        ///< TypeID for the relocation
    uint32_t OffsetNameOff; ///< The string to traverse types
  };

  struct BPFExternReloc {
    uint32_t InsnOffset;    ///< Byte offset in this section
    uint32_t ExternNameOff; ///< The string for external variable
  };

Note that only externs with attribute section ".BPF.patchable_externs"
are considered for Extern Reloc which will be patched by bpf loader
right before the load.

For the above test case, two offset records and one extern record
will be generated:
  OffsetReloc records:
        .long   .Ltmp12                 # Insn Offset
        .long   7                       # TypeId
        .long   242                     # Type Decode String
        .long   .Ltmp18                 # Insn Offset
        .long   7                       # TypeId
        .long   242                     # Type Decode String

  ExternReloc record:
        .long   .Ltmp5                  # Insn Offset
        .long   165                     # External Variable

  In string table:
        .ascii  "0:1"                   # string offset=242
        .ascii  "__kernel_version"      # string offset=165

The default member offset can be calculated as
    the 2nd member offset (0 representing the 1st member) of struct "sk_buff".

The asm code:
    .Ltmp5:
    .Ltmp6:
            r2 = 0
            r3 = 41608
    .Ltmp7:
    .Ltmp8:
            .loc    1 18 9 is_stmt 0        # t.c:18:9
    .Ltmp9:
            if r3 > r2 goto LBB0_2
    .Ltmp10:
    .Ltmp11:
            .loc    1 0 9                   # t.c:0:9
    .Ltmp12:
            r2 = 8
    .Ltmp13:
            .loc    1 19 66 is_stmt 1       # t.c:19:66
    .Ltmp14:
    .Ltmp15:
            r3 = *(u64 *)(r1 + 0)
            goto LBB0_3
    .Ltmp16:
    .Ltmp17:
    LBB0_2:
            .loc    1 0 66 is_stmt 0        # t.c:0:66
    .Ltmp18:
            r2 = 8
            .loc    1 21 66 is_stmt 1       # t.c:21:66
    .Ltmp19:
            r3 = *(u64 *)(r1 + 8)
    .Ltmp20:
    .Ltmp21:
    LBB0_3:
            .loc    1 0 66 is_stmt 0        # t.c:0:66
            r3 += r2
            r1 = r10
    .Ltmp22:
    .Ltmp23:
    .Ltmp24:
            r1 += -8
            r2 = 8
            call 4

For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number
8 is the structure offset based on the current BTF.
Loader needs to adjust it if it changes on the host.

For instruction .Ltmp5, "r2 = 0", the external variable
got a default value 0, loader needs to supply an appropriate
value for the particular host.

Compiling to generate object code and disassemble:
   0000000000000000 bpf_prog:
           0:       b7 02 00 00 00 00 00 00         r2 = 0
           1:       7b 2a f8 ff 00 00 00 00         *(u64 *)(r10 - 8) = r2
           2:       b7 02 00 00 00 00 00 00         r2 = 0
           3:       b7 03 00 00 88 a2 00 00         r3 = 41608
           4:       2d 23 03 00 00 00 00 00         if r3 > r2 goto +3 <LBB0_2>
           5:       b7 02 00 00 08 00 00 00         r2 = 8
           6:       79 13 00 00 00 00 00 00         r3 = *(u64 *)(r1 + 0)
           7:       05 00 02 00 00 00 00 00         goto +2 <LBB0_3>

    0000000000000040 LBB0_2:
           8:       b7 02 00 00 08 00 00 00         r2 = 8
           9:       79 13 08 00 00 00 00 00         r3 = *(u64 *)(r1 + 8)

    0000000000000050 LBB0_3:
          10:       0f 23 00 00 00 00 00 00         r3 += r2
          11:       bf a1 00 00 00 00 00 00         r1 = r10
          12:       07 01 00 00 f8 ff ff ff         r1 += -8
          13:       b7 02 00 00 08 00 00 00         r2 = 8
          14:       85 00 00 00 04 00 00 00         call 4

Instructions #2, #5 and #8 need relocation resoutions from the loader.

Signed-off-by: Yonghong Song <[email protected]>

Differential Revision: https://reviews.llvm.org/D61524

llvm-svn: 365503
kkjeer pushed a commit that referenced this pull request Sep 23, 2020
When `Target::GetEntryPointAddress()` calls `exe_module->GetObjectFile()->GetEntryPointAddress()`, and the returned
`entry_addr` is valid, it can immediately be returned.

However, just before that, an `llvm::Error` value has been setup, but in this case it is not consumed before returning, like is done further below in the function.

In https://bugs.freebsd.org/248745 we got a bug report for this, where a very simple test case aborts and dumps core:

```
* thread #1, name = 'testcase', stop reason = breakpoint 1.1
    frame #0: 0x00000000002018d4 testcase`main(argc=1, argv=0x00007fffffffea18) at testcase.c:3:5
   1	int main(int argc, char *argv[])
   2	{
-> 3	    return 0;
   4	}
(lldb) p argc
Program aborted due to an unhandled Error:
Error value was Success. (Note: Success values must still be checked prior to being destroyed).

Thread 1 received signal SIGABRT, Aborted.
thr_kill () at thr_kill.S:3
3	thr_kill.S: No such file or directory.
(gdb) bt
#0  thr_kill () at thr_kill.S:3
#1  0x00000008049a0004 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
#2  0x0000000804916229 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
#3  0x000000000451b5f5 in fatalUncheckedError () at /usr/src/contrib/llvm-project/llvm/lib/Support/Error.cpp:112
#4  0x00000000019cf008 in GetEntryPointAddress () at /usr/src/contrib/llvm-project/llvm/include/llvm/Support/Error.h:267
#5  0x0000000001bccbd8 in ConstructorSetup () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:67
#6  0x0000000001bcd2c0 in ThreadPlanCallFunction () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:114
#7  0x00000000020076d4 in InferiorCallMmap () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp:97
#8  0x0000000001f4be33 in DoAllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/FreeBSD/ProcessFreeBSD.cpp:604
#9  0x0000000001fe51b9 in AllocatePage () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:347
#10 0x0000000001fe5385 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:383
#11 0x0000000001974da2 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2301
#12 CanJIT () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2331
#13 0x0000000001a1bf3d in Evaluate () at /usr/src/contrib/llvm-project/lldb/source/Expression/UserExpression.cpp:190
#14 0x00000000019ce7a2 in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Target/Target.cpp:2372
#15 0x0000000001ad784c in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:414
#16 0x0000000001ad86ae in DoExecute () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:646
#17 0x0000000001a5e3ed in Execute () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandObject.cpp:1003
#18 0x0000000001a6c4a3 in HandleCommand () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:1762
#19 0x0000000001a6f98c in IOHandlerInputComplete () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2760
#20 0x0000000001a90b08 in Run () at /usr/src/contrib/llvm-project/lldb/source/Core/IOHandler.cpp:548
#21 0x00000000019a6c6a in ExecuteIOHandlers () at /usr/src/contrib/llvm-project/lldb/source/Core/Debugger.cpp:903
#22 0x0000000001a70337 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2946
#23 0x0000000001d9d812 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/API/SBDebugger.cpp:1169
#24 0x0000000001918be8 in MainLoop () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:675
#25 0x000000000191a114 in main () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:890```

Fix the incorrect error catch by only instantiating an `Error` object if it is necessary.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D86355

(cherry picked from commit 1ce07cd)
sulekhark pushed a commit that referenced this pull request Jul 21, 2021
Andrei Matei reported a llvm11 core dump for his bpf program
   https://bugs.llvm.org/show_bug.cgi?id=48578
The core dump happens in LiveVariables analysis phase.
  #4 0x00007fce54356bb0 __restore_rt
  #5 0x00007fce4d51785e llvm::LiveVariables::HandleVirtRegUse(unsigned int,
      llvm::MachineBasicBlock*, llvm::MachineInstr&)
  #6 0x00007fce4d519abe llvm::LiveVariables::runOnInstr(llvm::MachineInstr&,
      llvm::SmallVectorImpl<unsigned int>&)
  #7 0x00007fce4d519ec6 llvm::LiveVariables::runOnBlock(llvm::MachineBasicBlock*, unsigned int)
  #8 0x00007fce4d51a4bf llvm::LiveVariables::runOnMachineFunction(llvm::MachineFunction&)
The bug can be reproduced with llvm12 and latest trunk as well.

Futher analysis shows that there is a bug in BPF peephole
TRUNC elimination optimization, which tries to remove
unnecessary TRUNC operations (a <<= 32; a >>= 32).
Specifically, the compiler did wrong transformation for the
following patterns:
   %1 = LDW ...
   %2 = SLL_ri %1, 32
   %3 = SRL_ri %2, 32
   ... %3 ...
   %4 = SRA_ri %2, 32
   ... %4 ...

The current transformation did not check how many uses of %2
and did transformation like
   %1 = LDW ...
   ... %1 ...
   %4 = SRL_ri %2, 32
   ... %4 ...
and pseudo register %2 is used by not defined and
caused LiveVariables analysis core dump.

To fix the issue, when traversing back from SRL_ri to SLL_ri,
check to ensure SLL_ri has only one use. Otherwise, don't
do transformation.

Differential Revision: https://reviews.llvm.org/D97792

(cherry picked from commit 51cdb78)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants