-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[analyzer] Refine invalidation caused by fread
#93408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change enables more accurate modeling of the write effects of `fread`. In particular, instead of invalidating the whole buffer, in a best-effort basis, we would try to invalidate the actually accesses elements of the buffer. This preserves the previous value of the buffer of the unaffected slots. As a result, diagnose more uninitialized buffer uses for example. Currently, this refined invalidation only triggers for `fread` if and only if the `count` parameter and the buffer pointer's index component are concrete or perfectly-constrained symbols. Additionally, if the `fread` would read more than 64 elements, the whole buffer is invalidated as before. This is to have safeguards against performance issues. Refer to the comments of the assertions in the following example to see the changes in the diagnostics: ```c++ void demo() { FILE *fp = fopen("/home/test", "rb+"); if (!fp) return; int buffer[10]; // uninitialized int read_items = fread(buffer+1, sizeof(int), 5, fp); if (5 == read_items) { int v1 = buffer[1]; // Unknown value but not garbage. clang_analyzer_isTainted(v1); // expected-warning {{YES}} <-- Would be "NO" without this patch. clang_analyzer_dump(v1); // expected-warning {{conj_}} <-- Not a "derived" symbol, so it's directly invalidated now. int v0 = buffer[0]; // expected-warning {{Assigned value is garbage or undefined}} <-- Had no report here before. (void)(v1 + v0); } else { // If 'fread' had an error. int v0 = buffer[0]; // expected-warning {{Assigned value is garbage or undefined}} <-- Had no report here before. (void)v0; } fclose(fp); } ``` [CPP-3247](https://sonarsource.atlassian.net/browse/CPP-3247) Patch by Marco Borgeaud (marco-antognini-sonarsource)
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-static-analyzer-1 Author: Balazs Benics (steakhal) ChangesThis change enables more accurate modeling of the write effects of Currently, this refined invalidation only triggers for Refer to the comments of the assertions in the following example to see the changes in the diagnostics: void demo() {
FILE *fp = fopen("/home/test", "rb+");
if (!fp) return;
int buffer[10]; // uninitialized
int read_items = fread(buffer+1, sizeof(int), 5, fp);
if (5 == read_items) {
int v1 = buffer[1]; // Unknown value but not garbage.
clang_analyzer_isTainted(v1); // expected-warning {{YES}} <-- Would be "NO" without this patch.
clang_analyzer_dump(v1); // expected-warning {{conj_}} <-- Not a "derived" symbol, so it's directly invalidated now.
int v0 = buffer[0]; // expected-warning {{Assigned value is garbage or undefined}} <-- Had no report here before.
(void)(v1 + v0);
} else {
// If 'fread' had an error.
int v0 = buffer[0]; // expected-warning {{Assigned value is garbage or undefined}} <-- Had no report here before.
(void)v0;
}
fclose(fp);
} Patch by Marco Borgeaud (marco-antognini-sonarsource) Full diff: https://github.com/llvm/llvm-project/pull/93408.diff 2 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
index d4e020f7a72a0..7b42c4f72b322 100644
--- a/clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
@@ -717,18 +717,71 @@ const ExplodedNode *StreamChecker::getAcquisitionSite(const ExplodedNode *N,
return nullptr;
}
+/// Invalidate only the requested elements instead of the whole buffer.
+/// This is basically a refinement of the more generic 'escapeArgs' or
+/// the plain old 'invalidateRegions'.
+/// This only works if the \p StartIndex and \p Count are concrete or
+/// perfectly-constrained.
+static ProgramStateRef
+escapeByStartIndexAndCount(ProgramStateRef State, CheckerContext &C,
+ const CallEvent &Call, const MemRegion *Buffer,
+ QualType ElemType, SVal StartIndex, SVal Count) {
+ if (!llvm::isa_and_nonnull<SubRegion>(Buffer))
+ return State;
+
+ auto UnboxAsInt = [&C, &State](SVal V) -> std::optional<int64_t> {
+ auto &SVB = C.getSValBuilder();
+ if (const llvm::APSInt *Int = SVB.getKnownValue(State, V))
+ return Int->tryExtValue();
+ return std::nullopt;
+ };
+
+ auto StartIndexVal = UnboxAsInt(StartIndex);
+ auto CountVal = UnboxAsInt(Count);
+
+ // FIXME: Maybe we could make this more generic, and expose this by the
+ // 'invalidateRegions' API. After doing so, it might make sense to make this
+ // limit configurable.
+ constexpr int MaxInvalidatedElementsLimit = 64;
+ if (!StartIndexVal || !CountVal || *CountVal > MaxInvalidatedElementsLimit) {
+ return State->invalidateRegions({loc::MemRegionVal{Buffer}},
+ Call.getOriginExpr(), C.blockCount(),
+ C.getLocationContext(),
+ /*CausesPointerEscape=*/false);
+ }
+
+ constexpr auto DoNotInvalidateSuperRegion =
+ RegionAndSymbolInvalidationTraits::InvalidationKinds::
+ TK_DoNotInvalidateSuperRegion;
+
+ auto &RegionManager = Buffer->getMemRegionManager();
+ SmallVector<SVal> EscapingVals;
+ EscapingVals.reserve(*CountVal);
+
+ RegionAndSymbolInvalidationTraits ITraits;
+ for (auto Idx : llvm::seq(*StartIndexVal, *StartIndexVal + *CountVal)) {
+ NonLoc Index = C.getSValBuilder().makeArrayIndex(Idx);
+ const auto *Element = RegionManager.getElementRegion(
+ ElemType, Index, cast<SubRegion>(Buffer), C.getASTContext());
+ EscapingVals.push_back(loc::MemRegionVal(Element));
+ ITraits.setTrait(Element, DoNotInvalidateSuperRegion);
+ }
+ return State->invalidateRegions(EscapingVals, Call.getOriginExpr(),
+ C.blockCount(), C.getLocationContext(),
+ /*CausesPointerEscape=*/false,
+ /*InvalidatedSymbols=*/nullptr, &Call,
+ &ITraits);
+}
+
static ProgramStateRef escapeArgs(ProgramStateRef State, CheckerContext &C,
const CallEvent &Call,
ArrayRef<unsigned int> EscapingArgs) {
- const auto *CE = Call.getOriginExpr();
-
- SmallVector<SVal> EscapingVals;
- EscapingVals.reserve(EscapingArgs.size());
- for (auto EscArgIdx : EscapingArgs)
- EscapingVals.push_back(Call.getArgSVal(EscArgIdx));
- State = State->invalidateRegions(EscapingVals, CE, C.blockCount(),
- C.getLocationContext(),
- /*CausesPointerEscape=*/false);
+ auto GetArgSVal = [&Call](int Idx) { return Call.getArgSVal(Idx); };
+ auto EscapingVals = to_vector(map_range(EscapingArgs, GetArgSVal));
+ State = State->invalidateRegions(EscapingVals, Call.getOriginExpr(),
+ C.blockCount(), C.getLocationContext(),
+ /*CausesPointerEscape=*/false,
+ /*InvalidatedSymbols=*/nullptr);
return State;
}
@@ -937,8 +990,21 @@ void StreamChecker::evalFreadFwrite(const FnDescription *Desc,
// At read, invalidate the buffer in any case of error or success,
// except if EOF was already present.
- if (IsFread && !E.isStreamEof())
- State = escapeArgs(State, C, Call, {0});
+ if (IsFread && !E.isStreamEof()) {
+ // Try to invalidate the individual elements.
+ if (const auto *BufferFirstElem =
+ dyn_cast_or_null<ElementRegion>(Call.getArgSVal(0).getAsRegion())) {
+ const MemRegion *Buffer = BufferFirstElem->getSuperRegion();
+ QualType ElemTy = BufferFirstElem->getElementType();
+ SVal FirstAccessedItem = BufferFirstElem->getIndex();
+ SVal ItemCount = Call.getArgSVal(2);
+ State = escapeByStartIndexAndCount(State, C, Call, Buffer, ElemTy,
+ FirstAccessedItem, ItemCount);
+ } else {
+ // Otherwise just fall back to invalidating the whole buffer.
+ State = escapeArgs(State, C, Call, {0});
+ }
+ }
// Generate a transition for the success state.
// If we know the state to be FEOF at fread, do not add a success state.
diff --git a/clang/test/Analysis/fread.cpp b/clang/test/Analysis/fread.cpp
new file mode 100644
index 0000000000000..2bf9baefe1395
--- /dev/null
+++ b/clang/test/Analysis/fread.cpp
@@ -0,0 +1,328 @@
+// RUN: %clang_analyze_cc1 -verify %s \
+// RUN: -analyzer-checker=core,unix.Stream,alpha.security.taint \
+// RUN: -analyzer-checker=debug.ExprInspection
+
+#define EOF (-1)
+
+extern "C" {
+typedef __typeof(sizeof(int)) size_t;
+typedef struct _FILE FILE;
+
+FILE *fopen(const char *filename, const char *mode);
+int fclose(FILE *stream);
+size_t fread(void *buffer, size_t size, size_t count, FILE *stream);
+int fgetc(FILE *stream);
+void *malloc(size_t size);
+}
+
+void clang_analyzer_dump(int);
+void clang_analyzer_isTainted(int);
+void clang_analyzer_warnIfReached();
+
+// A stream is only tracked by StreamChecker if it results from a call to "fopen".
+// Otherwise, there is no specific modelling of "fread".
+void untracked_stream(FILE *fp) {
+ char c;
+ if (1 == fread(&c, 1, 1, fp)) {
+ char p = c; // Unknown value but not garbage and not modeled by checker.
+ } else {
+ char p = c; // Possibly indeterminate value but not modeled by checker.
+ }
+}
+
+void fgetc_props_taint() {
+ if (FILE *fp = fopen("/home/test", "rb+")) {
+ int c = fgetc(fp); // c is tainted.
+ if (c != EOF) {
+ clang_analyzer_isTainted(c); // expected-warning{{YES}}
+ }
+ fclose(fp);
+ }
+}
+
+void fread_props_taint() {
+ if (FILE *fp = fopen("/home/test", "rb+")) {
+ char buffer[10];
+ int c = fread(buffer, 1, 10, fp); // c is tainted.
+ if (c != 10) {
+ // If the read failed, then the number of bytes successfully read should be tainted.
+ clang_analyzer_isTainted(c); // expected-warning{{YES}}
+ }
+ fclose(fp);
+ }
+}
+
+void read_one_byte1() {
+ if (FILE *fp = fopen("/home/test", "rb+")) {
+ char c;
+ if (1 == fread(&c, 1, 1, fp)) {
+ char p = c; // Unknown value but not garbage.
+ clang_analyzer_isTainted(p); // expected-warning{{YES}}
+ } else {
+ char p = c; // Possibly indeterminate value but not modeled by checker.
+ clang_analyzer_isTainted(p); // expected-warning{{YES}}
+ }
+ fclose(fp);
+ }
+}
+
+void read_one_byte2(char *buffer) {
+ if (FILE *fp = fopen("/home/test", "rb+")) {
+ if (1 == fread(buffer, 1, 1, fp)) {
+ char p = buffer[0]; // Unknown value but not garbage.
+ clang_analyzer_isTainted(p); // expected-warning{{YES}}
+ } else {
+ char p = buffer[0]; // Possibly indeterminate value but not modeled by checker.
+ clang_analyzer_isTainted(p); // expected-warning{{YES}}
+ }
+ fclose(fp);
+ }
+}
+
+void read_one_byte3(char *buffer) {
+ buffer[1] = 10;
+ if (FILE *fp = fopen("/home/test", "rb+")) {
+ // buffer[1] is not mutated by fread and remains not tainted.
+ fread(buffer, 1, 1, fp);
+ char p = buffer[1];
+ clang_analyzer_isTainted(p); // expected-warning{{NO}}
+ clang_analyzer_dump(buffer[1]); // expected-warning{{derived_}} FIXME This should be 10.
+ fclose(fp);
+ }
+}
+
+void read_many_bytes(char *buffer) {
+ if (FILE *fp = fopen("/home/test", "rb+")) {
+ if (42 == fread(buffer, 1, 42, fp)) {
+ char p = buffer[0]; // Unknown value but not garbage.
+ clang_analyzer_isTainted(p); // expected-warning{{YES}}
+ } else {
+ char p = buffer[0]; // Possibly indeterminate value but not modeled.
+ clang_analyzer_isTainted(p); // expected-warning{{YES}}
+ }
+ fclose(fp);
+ }
+}
+
+void random_access_write1(int index) {
+ if (FILE *fp = fopen("/home/test", "rb+")) {
+ long c[4];
+ bool success = 2 == fread(c + 1, sizeof(long), 2, fp);
+
+ switch (index) {
+ case 0:
+ // c[0] is not mutated by fread.
+ if (success) {
+ char p = c[0]; // expected-warning {{Assigned value is garbage or undefined}} We kept the first byte intact.
+ } else {
+ char p = c[0]; // expected-warning {{Assigned value is garbage or undefined}} We kept the first byte intact.
+ }
+ break;
+
+ case 1:
+ if (success) {
+ // Unknown value but not garbage.
+ clang_analyzer_isTainted(c[1]); // expected-warning {{YES}}
+ clang_analyzer_dump(c[1]); // expected-warning {{conj_}}
+ } else {
+ // Possibly indeterminate value but not modeled.
+ clang_analyzer_isTainted(c[1]); // expected-warning {{YES}}
+ clang_analyzer_dump(c[1]); // expected-warning {{conj_}}
+ }
+ break;
+
+ case 2:
+ if (success) {
+ long p = c[2]; // Unknown value but not garbage.
+ // FIXME: Taint analysis only marks the first byte of a memory region. See getPointeeOf in GenericTaintChecker.cpp.
+ clang_analyzer_isTainted(c[2]); // expected-warning {{NO}}
+ clang_analyzer_dump(c[2]); // expected-warning {{conj_}}
+ } else {
+ // Possibly indeterminate value but not modeled.
+ clang_analyzer_isTainted(c[2]); // expected-warning {{NO}} // FIXME: See above.
+ clang_analyzer_dump(c[2]); // expected-warning {{conj_}}
+ }
+ break;
+
+ case 3:
+ // c[3] is not mutated by fread.
+ if (success) {
+ long p = c[3]; // expected-warning {{Assigned value is garbage or undefined}}
+ } else {
+ long p = c[3]; // expected-warning {{Assigned value is garbage or undefined}}
+ }
+ break;
+ }
+
+ fclose(fp);
+ }
+}
+
+void random_access_write2(bool b) {
+ if (FILE *fp = fopen("/home/test", "rb+")) {
+ int buffer[10];
+ int *ptr = buffer + 2;
+ if (5 == fread(ptr - 1, sizeof(int), 5, fp)) {
+ if (b) {
+ int p = buffer[1]; // Unknown value but not garbage.
+ clang_analyzer_isTainted(p); // expected-warning {{YES}}
+ clang_analyzer_dump(p); // expected-warning {{conj_}}
+ } else {
+ int p = buffer[0]; // expected-warning {{Assigned value is garbage or undefined}}
+ }
+ } else {
+ int p = buffer[0]; // expected-warning {{Assigned value is garbage or undefined}}
+ }
+ fclose(fp);
+ }
+}
+
+void random_access_write_symbolic_count(size_t count) {
+ // Cover a case that used to crash (symbolic count).
+ if (count > 2)
+ return;
+
+ if (FILE *fp = fopen("/home/test", "rb+")) {
+ long c[4];
+ fread(c + 1, sizeof(long), count, fp);
+
+ // c[0] and c[3] are never mutated by fread, but because "count" is a symbolic value, the checker doesn't know that.
+ long p = c[0];
+ clang_analyzer_isTainted(p); // expected-warning {{NO}}
+ clang_analyzer_dump(p); // expected-warning {{derived_}}
+
+ p = c[3];
+ clang_analyzer_isTainted(p); // expected-warning {{NO}}
+ clang_analyzer_dump(p); // expected-warning {{derived_}}
+
+ p = c[1];
+ clang_analyzer_isTainted(p); // expected-warning {{YES}}
+ clang_analyzer_dump(p); // expected-warning {{derived_}}
+
+ fclose(fp);
+ }
+}
+
+void dynamic_random_access_write(int startIndex) {
+ if (FILE *fp = fopen("/home/test", "rb+")) {
+ long buffer[10];
+ // Cannot reason about index.
+ size_t res = fread(buffer + startIndex, sizeof(long), 5, fp);
+ if (5 == res) {
+ long p = buffer[startIndex];
+ clang_analyzer_isTainted(p); // expected-warning {{NO}}
+ clang_analyzer_dump(p); // expected-warning {{conj_}}
+ } else if (res == 4) {
+ long p = buffer[startIndex];
+ clang_analyzer_isTainted(p); // expected-warning {{NO}}
+ clang_analyzer_dump(p); // expected-warning {{conj_}}
+ p = buffer[startIndex + 1];
+ clang_analyzer_isTainted(p); // expected-warning {{NO}}
+ clang_analyzer_dump(p); // expected-warning {{conj_}}
+ p = buffer[startIndex + 2];
+ clang_analyzer_isTainted(p); // expected-warning {{NO}}
+ clang_analyzer_dump(p); // expected-warning {{conj_}}
+ p = buffer[startIndex + 3];
+ clang_analyzer_isTainted(p); // expected-warning {{NO}}
+ clang_analyzer_dump(p); // expected-warning {{conj_}}
+ p = buffer[startIndex + 4];
+ clang_analyzer_isTainted(p); // expected-warning {{NO}}
+ clang_analyzer_dump(p); // expected-warning {{conj_}}
+ p = buffer[startIndex + 5];
+ clang_analyzer_isTainted(p); // expected-warning {{NO}}
+ clang_analyzer_dump(p); // expected-warning {{conj_}}
+ p = buffer[0];
+ clang_analyzer_isTainted(p); // expected-warning {{NO}}
+ clang_analyzer_dump(p); // expected-warning {{conj_}}
+ } else {
+ long p = buffer[startIndex];
+ clang_analyzer_isTainted(p); // expected-warning {{NO}}
+ clang_analyzer_dump(p); // expected-warning {{conj_}}
+ p = buffer[0];
+ clang_analyzer_isTainted(p); // expected-warning {{NO}}
+ clang_analyzer_dump(p); // expected-warning {{conj_}}
+ }
+ fclose(fp);
+ }
+}
+
+struct S {
+ int a;
+ long b;
+};
+
+void comopund_write1() {
+ if (FILE *fp = fopen("/home/test", "rb+")) {
+ S s; // s.a is not touched by fread.
+ if (1 == fread(&s.b, sizeof(s.b), 1, fp)) {
+ long p = s.b;
+ clang_analyzer_isTainted(p); // expected-warning {{YES}}
+ clang_analyzer_dump(p); // expected-warning {{conj_}}
+ } else {
+ long p = s.b;
+ clang_analyzer_isTainted(p); // expected-warning {{YES}}
+ clang_analyzer_dump(p); // expected-warning {{conj_}}
+ }
+ fclose(fp);
+ }
+}
+
+void comopund_write2() {
+ if (FILE *fp = fopen("/home/test", "rb+")) {
+ S s; // s.a is not touched by fread.
+ if (1 == fread(&s.b, sizeof(s.b), 1, fp)) {
+ long p = s.a; // FIXME: This should raise an uninitialized read.
+ clang_analyzer_isTainted(p); // expected-warning {{NO}} FIXME: This should be YES.
+ clang_analyzer_dump(p); // expected-warning {{conj_}}
+ } else {
+ long p = s.a; // FIXME: This should raise an uninitialized read.
+ clang_analyzer_isTainted(p); // expected-warning {{NO}} FIXME: This should be YES.
+ clang_analyzer_dump(p); // expected-warning {{conj_}}
+ }
+ fclose(fp);
+ }
+}
+
+void var_write() {
+ if (FILE *fp = fopen("/home/test", "rb+")) {
+ int a, b; // 'a' is not touched by fread.
+ if (1 == fread(&b, sizeof(b), 1, fp)) {
+ long p = a; // expected-warning{{Assigned value is garbage or undefined}}
+ } else {
+ long p = a; // expected-warning{{Assigned value is garbage or undefined}}
+ }
+ fclose(fp);
+ }
+}
+
+// When reading a lot of data, invalidating all elements is too time-consuming.
+// Instead, the knowledge of the whole array is lost.
+#define MaxInvalidatedElementRegion 64 // See StreamChecker::evalFreadFwrite in StreamChecker.cpp.
+#define PastMaxComplexity MaxInvalidatedElementRegion + 1
+void test_large_read() {
+ int buffer[PastMaxComplexity + 1];
+ buffer[PastMaxComplexity] = 42;
+ if (FILE *fp = fopen("/home/test", "rb+")) {
+ if (buffer[PastMaxComplexity] != 42) {
+ clang_analyzer_warnIfReached(); // Unreachable.
+ }
+ if (1 == fread(buffer, sizeof(int), PastMaxComplexity, fp)) {
+ if (buffer[PastMaxComplexity] != 42) {
+ clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
+ }
+ }
+ fclose(fp);
+ }
+}
+
+void test_small_read() {
+ int buffer[10];
+ buffer[5] = 42;
+ if (FILE *fp = fopen("/home/test", "rb+")) {
+ clang_analyzer_dump(buffer[5]); // expected-warning{{42 S32b}}
+ if (1 == fread(buffer, sizeof(int), 5, fp)) {
+ clang_analyzer_dump(buffer[5]); // expected-warning{{42 S32b}}
+ }
+ fclose(fp);
+ }
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for publishing this commit, it's a nice refinement of the modelling capabilities, and mostly LGTM.
In the inline comments I added some very minor remarks and one question about the handling of a fread
that reads into the beginning of an array region.
if (!llvm::isa_and_nonnull<SubRegion>(Buffer)) | ||
return State; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd slightly prefer a dyn_cast_or_null
instead of this isa
check that's eventually followed by a bare cast
.
auto StartIndexVal = UnboxAsInt(StartIndex); | ||
auto CountVal = UnboxAsInt(Count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto StartIndexVal = UnboxAsInt(StartIndex); | |
auto CountVal = UnboxAsInt(Count); | |
std::optional<int64_t> StartIndexVal = UnboxAsInt(StartIndex); | |
std::optional<int64_t> CountVal = UnboxAsInt(Count); |
The explicitly specified type would make this code easier to read (without it, my first guess was that these are int
variables because the lambda is named ...AsInt
).
clang/test/Analysis/fread.cpp
Outdated
long b; | ||
}; | ||
|
||
void comopund_write1() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void comopund_write1() { | |
void compound_write1() { |
Just a typo.
clang/test/Analysis/fread.cpp
Outdated
} | ||
} | ||
|
||
void comopund_write2() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void comopund_write2() { | |
void compound_write2() { |
clang/test/Analysis/fread.cpp
Outdated
#define EOF (-1) | ||
|
||
extern "C" { | ||
typedef __typeof(sizeof(int)) size_t; | ||
typedef struct _FILE FILE; | ||
|
||
FILE *fopen(const char *filename, const char *mode); | ||
int fclose(FILE *stream); | ||
size_t fread(void *buffer, size_t size, size_t count, FILE *stream); | ||
int fgetc(FILE *stream); | ||
void *malloc(size_t size); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test file is written as .cpp
file, but it's testing plain C functionality and it uses fread
, malloc
etc. as if it was plain C code (under C++, <cstdio>
and <cstdlib>
declares these as std::fread
, std::malloc
...).
This doesn't cause any problems right now, but it'd be better to turn this into a .c
file and include one of the "standard" system header emulator files instead of re-declaring these functions at the top of this test file.
if (const auto *BufferFirstElem = | ||
dyn_cast_or_null<ElementRegion>(Call.getArgSVal(0).getAsRegion())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we see an ElementRegion
in the case when we're fread
ing into the beginning of an array variable?
I see that the element region layer should be there if we did pointer arithmetic or if this is a symbolic region converted to a type; but not sure that this covers the "simply read into an array" case as well. Could you add a simple testcase that validates that the individual element invalidation activates in a situation like
int arr[10];
fread(arr, sizeof(int), 5, <FILE pointer>);
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case fread
reads to the beginning of a buffer, we won't have an ElementRegion
, thus the heuristic for eagerly binding the invalidated elements won't trigger. This is unfortunate, but you can think of this as we keep the previous behavior.
To circumvent this, I'd need to know the type for for the pointee.
This would imply that I should special-case TypedValueRegion
and SymbolicRegion
.
I'll think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented.
EscapingVals.reserve(*CountVal); | ||
|
||
RegionAndSymbolInvalidationTraits ITraits; | ||
for (auto Idx : llvm::seq(*StartIndexVal, *StartIndexVal + *CountVal)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop does not work if the type of the array is not the same as the "size" parameter passed to fread
.:
int buffer[100];
fread(buffer + 1, 3, 5, file);
In this case 3*5 bytes should be read by fread
into the array. If sizeof(int)==4
4 elements should be invalidated in the buffer starting from index 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a problem.
To properly invalidate element-wise, I must bind the correctly typed conjured symbols so that subsequent lookups in the store would hit the entry.
Consequently, I can't consider the buffer as a char array and invalidate each chars in the range.
I must use ElementRegions
, thus I need properly aligned item-wise iteration and invalidation.
So, I could try to convert the element size and item count into a start byte offset and length in bytes. After this I could check if the start bytes offset would land on an item boundary and if the length would cover a whole element object.
This way we could cover the case like:
int buffer[100]; // assuming 4 byte ints.
fread(buffer + 1, 2, 10, file); // Reads 20 bytes of data, which is dividable by 4, thus we can think of the 'buffer[1:6]' elements as individually invalidated.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented.
const MemRegion *Buffer = BufferFirstElem->getSuperRegion(); | ||
QualType ElemTy = BufferFirstElem->getElementType(); | ||
SVal FirstAccessedItem = BufferFirstElem->getIndex(); | ||
SVal ItemCount = Call.getArgSVal(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is some redundancy in getting the argument values, SizeVal
and NMembVal
are already available and could be used by the function (if it would take NonLoc
).
Fixed most NFC typos and suggestions. |
Checkout the new implementation and the added "weird" fread tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks relatively good, I found only smaller issues.
auto Zero = [&SVB] { | ||
BasicValueFactory &BVF = SVB.getBasicValueFactory(); | ||
return nonloc::ConcreteInt(BVF.getIntValue(0, /*isUnsigned=*/false)); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not better to store the Zero
value in a variable instead of using this lambda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way it's lazily evaluated. I wanted to avoid sideffects inside BVF and it seemed like a plausible way achieving it.
} | ||
|
||
void test_partial_elements_read(void) { | ||
clang_analyzer_dump(sizeof(int)); // expected-warning {{4 S32b}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some (buildbot) platforms are different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pinned the target triple in the RUN line to account for this.
clang/test/Analysis/fread.c
Outdated
if (20 == fread(buffer + 1, 3, 20, fp)) { | ||
clang_analyzer_dump(buffer[0]); // expected-warning{{1 S32b}} | ||
clang_analyzer_dump(buffer[20]); // expected-warning{{conj_}} | ||
clang_analyzer_dump(buffer[21]); // expected-warning{{1st function call argument is an uninitialized value}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these index values correct? buffer[15]
should be the last changed item, buffer[16]
should be 3, and buffer[17]
is the uninitialized value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. I made a bug when calling escapeByStartIndexAndCount
. I should have passed NumElementsRead
instead of CountVal
. Thanks for noticing this!
long buffer[10]; | ||
// Cannot reason about index. | ||
size_t res = fread(buffer + startIndex, sizeof(long), 5, fp); | ||
if (5 == res) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to explain why these checks are here in this way. Is it not enough to have only the longest if
branch for all cases if all are the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just upstreaming this hunk. I wasn't here when it was written.
My guess what these could test is the fact that we write to the buffer from a symbolic index, up to 5 long
values.
The engine should deduce from this that at the buffer[startIndex + 5]
we definitely have uninitialized values - However, as it's currently implemented right now, we can't. It is because I only handle the case when the index is concrete.
W.r.t. why we have different branches depending on the return value of read
, my guess would be that if we modeled that it only read 4 elements, then the buffer[startIndex + 4]
and subsequent elements should be uninitialized, etc.
I'll rework the test case to covey this interpretation.
clang/test/Analysis/fread.c
Outdated
if (4 == fread(asChar + 1, 1, 4, fp)) { | ||
void clang_analyzer_printState(void); | ||
clang_analyzer_printState(); | ||
clang_analyzer_dump(buffer[0]); // expected-warning{{3 S32b}} FIXME Reading a 'char' should not result in a 'S32b' value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FIXME is not understandable to me, if this is about the problem seen in the PR remark (problem with store and overlapping read) it would be better to make this comment more accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to highlight that reading a 8 bit wide char
element should never result in a value of 32 bits.
And this is casued by the current Store implementation, as: it does a lookup and it returns whatever is associated with that read start bit offset. Given that the last write the Store saw was the 32 bit wide int 3, it will just return that - no matter if that bitwidth would make sense or not.
I've added the 8 bit assumption to the FIXME. I hope it's good now.
clang/test/Analysis/fread.c
Outdated
} | ||
} | ||
|
||
void compound_write2(void) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me why some of the test functions are called "write" and some "read". It would be better to have all of them read
in the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the author thought that the read()
function call will have a write
effect to the buffer.
TBH I agree with you and I'll rename these.
bool DivisibleAccessSpan = (NumBytesRead % ElemSizeInChars) == 0; | ||
int64_t NumElementsRead = NumBytesRead / ElemSizeInChars; | ||
constexpr int MaxInvalidatedElementsLimit = 64; | ||
if (DivisibleAccessSpan && NumElementsRead <= MaxInvalidatedElementsLimit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to round the NumElementsRead
upwards instead of skipping the escape, in the not divisible case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied a patch to this code such that the last partial element is considered as a full access.
This is indeed better than doing a fallback and invalidating the full buffer.
Also modified the test to demonstrate this, but there is yet another bug in the current Store:
If we read from a byte offset of an element, we still get unknown as a result: int buffer[2] = {2,3}
then reading from ((char*)buffer)[1]
results in unknown. Compiler explorer.
Let's do another round. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks now OK at least to my knowledge.
It should be possible in |
Thanks for the green light!
Yes, that would be so nice. However, it's out of scope for me this time. |
Hi @steakhal , this change seems to have exposed a div/0 error in a very particular corner we came across after integrating this change. Could you try this case to see if can repro? The div/0 is occurring at line 1093:
This is the test source.
The crash ... |
Yea, it seems to crash. I'll fix it once I have some time. Probably early next week if I can't spare some time during the weekend. EDIT: The fix is proposed in #97199. |
We can get zero type size (thus div by zero crash) if the region is for a 'void*' pointer. In this patch, let's just override the void type with a char type to avoid the crash. Fixes #93408 (comment)
…#97199) We can get zero type size (thus div by zero crash) if the region is for a 'void*' pointer. In this patch, let's just override the void type with a char type to avoid the crash. Fixes llvm#93408 (comment)
…#97199) We can get zero type size (thus div by zero crash) if the region is for a 'void*' pointer. In this patch, let's just override the void type with a char type to avoid the crash. Fixes llvm#93408 (comment)
This change enables more accurate modeling of the write effects of `fread`. In particular, instead of invalidating the whole buffer, in a best-effort basis, we would try to invalidate the actually accesses elements of the buffer. This preserves the previous value of the buffer of the unaffected slots. As a result, diagnose more uninitialized buffer uses for example. Currently, this refined invalidation only triggers for `fread` if and only if the `count` parameter and the buffer pointer's index component are concrete or perfectly-constrained symbols. Additionally, if the `fread` would read more than 64 elements, the whole buffer is invalidated as before. This is to have safeguards against performance issues. Refer to the comments of the assertions in the following example to see the changes in the diagnostics: ```c++ void demo() { FILE *fp = fopen("/home/test", "rb+"); if (!fp) return; int buffer[10]; // uninitialized int read_items = fread(buffer+1, sizeof(int), 5, fp); if (5 == read_items) { int v1 = buffer[1]; // Unknown value but not garbage. clang_analyzer_isTainted(v1); // expected-warning {{YES}} <-- Would be "NO" without this patch. clang_analyzer_dump(v1); // expected-warning {{conj_}} <-- Not a "derived" symbol, so it's directly invalidated now. int v0 = buffer[0]; // expected-warning {{Assigned value is garbage or undefined}} <-- Had no report here before. (void)(v1 + v0); } else { // If 'fread' had an error. int v0 = buffer[0]; // expected-warning {{Assigned value is garbage or undefined}} <-- Had no report here before. (void)v0; } fclose(fp); } ``` CPP-3247, CPP-3802 Co-authored by Marco Borgeaud (marco-antognini-sonarsource)
In #93408 69bc159 I refined how invalidation is done for `fread`. It can crash, if the "size" or "count" parameters of "fread" is a perfectly constrained negative value. In such cases, when it will try to allocate a SmallVector with a negative size, which will cause a crash. To mitigate this issue, let's just guard against negative values. CPP-3247
This change enables more accurate modeling of the write effects of
fread
. In particular, instead of invalidating the whole buffer, in a best-effort basis, we would try to invalidate the actually accesses elements of the buffer. This preserves the previous value of the buffer of the unaffected slots. As a result, diagnose more uninitialized buffer uses for example.Currently, this refined invalidation only triggers for
fread
if and only if thecount
parameter and the buffer pointer's index component are concrete or perfectly-constrained symbols.Additionally, if the
fread
would read more than 64 elements, the whole buffer is invalidated as before. This is to have safeguards against performance issues.Refer to the comments of the assertions in the following example to see the changes in the diagnostics:
CPP-3247, CPP-3802
Patch by Marco Borgeaud (marco-antognini-sonarsource)