Skip to content

PersistentSourceLocations are ambiguous #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rchyena opened this issue Aug 12, 2019 · 5 comments
Closed

PersistentSourceLocations are ambiguous #18

rchyena opened this issue Aug 12, 2019 · 5 comments
Assignees
Labels
bug Something isn't working macro typedef

Comments

@rchyena
Copy link

rchyena commented Aug 12, 2019

Projects may use the C preprocessor to achieve polymorphic definitions. For example, consider this macro in safestack.h from OpenSSL:

# define SKM_DEFINE_STACK_OF(t1, t2, t3) \
    STACK_OF(t1); \
    typedef int (*sk_##t1##_compfunc)(const t3 * const *a, const t3 *const *b); \
    typedef void (*sk_##t1##_freefunc)(t3 *a); \
    typedef t3 * (*sk_##t1##_copyfunc)(const t3 *a); \
    static ossl_unused ossl_inline int sk_##t1##_num(const STACK_OF(t1) *sk) \
    { \
        return OPENSSL_sk_num((const OPENSSL_STACK *)sk); \
    } \
    <... snip ...>

This causes a problem for how we locate ConstraintVariables. We map PersistentSourceLocations to ConstraintVariables inside class ProgramInfo, but Clang places the spelling location of such definitions inside a scratch space, which can easily lead to namespace collisions. Testing against Icecast on Fedora 29, the following four (completely distinct) functions are associated with <scratch space>:12:1:

  1. int i2d_PKCS7_SIGNED(PKCS7_SIGNED *a, unsigned char **out)
  2. static inline int sk_OPENSSL_CSTRING_unshift(struct stack_st_OPENSSL_CSTRING *sk, const char *ptr) __attribute__((unused))
  3. static inline struct stack_st_void *sk_void_new_null() __attribute__((unused))
  4. void ASN1_IA5STRING_free(ASN1_IA5STRING *a)

Even if the compiler avoided the use of a scratch space for such functions, these collisions would continue to occur in the original file.

@rchyena rchyena self-assigned this Aug 12, 2019
@rchyena rchyena added the bug Something isn't working label Aug 12, 2019
@rchyena rchyena changed the title PersistentSourceLocation are ambiguous PersistentSourceLocations are ambiguous Aug 12, 2019
rchyena added a commit that referenced this issue Aug 12, 2019
In certain cases, the file/line/column information stored in the
PersistentSourceLocation class is not enough to uniquely disambiguate
definitions from one another.  To support these cases, this commit
adds a "reproducible" AST node identifier as a distinguishing feature.

According to the "Clang Front End for LLVM Developers' List":

    "If you want something that's more stable across runs (pointers
    would obviously change every time), you can have a look at the
    getID() method in various AST nodes, which produces an integer
    that doesn't change across runs but does change across host
    machines."

Which should work fine for our purposes.

This fixes #18.
@Machiry
Copy link
Collaborator

Machiry commented Aug 22, 2019

It looks like the fix committed for this (ad94795) is causing regressions.
We see different AST ids for the same field of a structure :/

Example:

426278 {"line":"/home/machiry/checkedc/icecast_typedebugging/checkedc-icecast/src/common/httpp/httpp.h:83:12:37538","Variables":[{"PointerVar":{"Vars":["q_164350","q_164351"], "name":"value"}}]},
426279 {"line":"/home/machiry/checkedc/icecast_typedebugging/checkedc-icecast/src/common/httpp/httpp.h:83:12:66084","Variables":[{"PointerVar":{"Vars":["q_194595","q_194596"], "name":"value"}}]},
426280 {"line":"/home/machiry/checkedc/icecast_typedebugging/checkedc-icecast/src/common/httpp/httpp.h:83:12:86162","Variables":[{"PointerVar":{"Vars":["q_70368","q_70369"], "name":"value"}}]},
426281 {"line":"/home/machiry/checkedc/icecast_typedebugging/checkedc-icecast/src/common/httpp/httpp.h:83:12:86508","Variables":[{"PointerVar":{"Vars":["q_7399","q_7400"], "name":"value"}}]},
426282 {"line":"/home/machiry/checkedc/icecast_typedebugging/checkedc-icecast/src/common/httpp/httpp.h:83:12:90330","Variables":[{"PointerVar":{"Vars":["q_126661","q_126662"], "name":"value"}}]},
426283 {"line":"/home/machiry/checkedc/icecast_typedebugging/checkedc-icecast/src/common/httpp/httpp.h:83:12:93814","Variables":[{"PointerVar":{"Vars":["q_81379","q_81380"], "name":"value"}}]},

We hit this, while trying to convert the icecast source from: https://github.com/plum-umd/checkedc-icecast

Maybe we should use ids only for functions?

Attached is the zip file containing constraint_output.json generated while converting icecast code.

Ref: @hasantouma

icecast_constraint_output.json.zip

rchyena added a commit that referenced this issue Aug 28, 2019
This should address the regression reported by @Machiry and discovered
by @hasantouma, and resolve #18.
rchyena added a commit that referenced this issue Jan 7, 2020
In certain cases, the file/line/column information stored in the
PersistentSourceLocation class is not enough to uniquely disambiguate
definitions from one another.  To support these cases, this commit
adds a "reproducible" AST node identifier as a distinguishing feature.

According to the "Clang Front End for LLVM Developers' List":

    "If you want something that's more stable across runs (pointers
    would obviously change every time), you can have a look at the
    getID() method in various AST nodes, which produces an integer
    that doesn't change across runs but does change across host
    machines."

Which should work fine for our purposes.

This fixes #18.

(cherry picked from commit ad94795)
rchyena added a commit that referenced this issue Jan 7, 2020
This should address the regression reported by @Machiry and discovered
by @hasantouma, and resolve #18.

(cherry picked from commit 1ff694a)
Machiry pushed a commit that referenced this issue May 17, 2020
Cherry-picked from commit 2d090f18a30386e4bb2b8dbfc1e2558d77198f4b

    Fix the generation of llvm-lit.py so that llvm-lit can be used to run Checked C tests in the Checked C repo from the command line. Several arguments were missing from configure_lit_site_cfg in the CMakeLists.txt file for the checkedc-wrapper directory.  This fixes Checked C issue checkedc/checkedc#340.

    Also fix some warnings about missing tools when running the tests.  The tools are clang-specific test executables and tools not used by the Checked C repo tests.  Delete them from the list of required tools.
@john-h-kastner
Copy link
Collaborator

john-h-kastner commented Sep 21, 2020

PR #266 improved how we deal with collisions between ambiguous source locations, but the root of the problem still exists. Source locations are still ambiguous. Ideally we can change the PersistentSourceLocation class in the future so that it uniquely identifies a point in the source.

@kyleheadley
Copy link
Member

Are PersistantSourceLocations better understood now? If this is not a specific issue, we can close it or incorporate it into a larger plan to deal with other source location ambiguities.

@mwhicks1
Copy link
Member

mwhicks1 commented Dec 8, 2020

@jackastner I think this is now a non-issue? That is, the specific problem that came up with here is now fixed, even though rewriting has other problems? If so, please close. If not, please say what we still need to do.

@john-h-kastner
Copy link
Collaborator

The broader issue here is still relevant: a PersistentSourceLocationobject does not actually uniquely identify a location in the source code. We have, however, improved how we handle the ambiguity so that we get correct conversion.

I think this issue can closed, and it should be incorporated into an issue for emitting warnings when 3C can't rewrite a pointer that we think should be checked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working macro typedef
Projects
None yet
Development

No branches or pull requests

5 participants