BLD: use unsigned instead of signed for lengths, avoid build warnings#26759
BLD: use unsigned instead of signed for lengths, avoid build warnings#26759WillAyd merged 7 commits intopandas-dev:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #26759 +/- ##
==========================================
- Coverage 91.71% 91.7% -0.01%
==========================================
Files 178 178
Lines 50740 50740
==========================================
- Hits 46538 46533 -5
- Misses 4202 4207 +5
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26759 +/- ##
==========================================
- Coverage 91.97% 91.97% -0.01%
==========================================
Files 180 180
Lines 50756 50756
==========================================
- Hits 46685 46682 -3
- Misses 4071 4074 +3
Continue to review full report at Codecov.
|
WillAyd
left a comment
There was a problem hiding this comment.
On board with the concept just question around casts (applicable in a few places)
pandas/_libs/src/parser/tokenizer.c
Outdated
| } | ||
|
|
||
| if (!(self->lines <= (int64_t) self->header_end + 1) && | ||
| if (!(self->lines <= (uint64_t) self->header_end + 1) && |
There was a problem hiding this comment.
Good catch, I think it isn't.
pandas/_libs/src/parser/tokenizer.c
Outdated
| static int make_stream_space(parser_t *self, size_t nbytes) { | ||
| int64_t i, cap, length; | ||
| uint64_t i, cap; | ||
| int64_t length; |
There was a problem hiding this comment.
Does leaving this as is not introduce any new build warnings? Looks like assignments and comparisons are done with unsigned types within this function
pandas/_libs/src/parser/tokenizer.c
Outdated
| slen = self->stream_len; \ | ||
| self->state = STATE; \ | ||
| if (line_limit > 0 && self->lines == start_lines + (int64_t)line_limit) { \ | ||
| if (line_limit > 0 && self->lines == start_lines + (uint64_t)line_limit) { \ |
There was a problem hiding this comment.
Is there a reason to even have these casts? Fraught with peril to begin with using macros I think this just adds nuances / developer burden
There was a problem hiding this comment.
Updated to remove these casts.
Should we consider getting rid of the macros? That's outside my region of competence.
There was a problem hiding this comment.
Yea might be worth doing so in a separate PR. Here's the cpp take on macros:
https://isocpp.org/wiki/faq/inline-functions#inline-vs-macros
I could see this one being particular troublesome with our codebase:
https://isocpp.org/wiki/faq/misc-technical-issues#macros-with-if
I think the main downside to using inline vs macros would be support for pre-C99 compilers, but I don't think that's a problem given age and the fact that we have inline functions elsewhere in the code base
|
OK by me. Does this offer any performance improvements? Wondering if Cython optimizes for not doing a wraparound check like in this SO question: |
pandas/_libs/src/parser/tokenizer.c
Outdated
| static int make_stream_space(parser_t *self, size_t nbytes) { | ||
| int64_t i, cap, length; | ||
| uint64_t i, cap; | ||
| int64_t length; |
There was a problem hiding this comment.
Because there weren't any warnings telling us not to. Probably makes more sense unsigned liked the others.
There was a problem hiding this comment.
can you change this, otherwise lgtm.
There was a problem hiding this comment.
Changing this requires changing a few other places to match. I think its benign, but not obvious. Go for it?
|
small comment, merge master and ping on green. |
|
Ping |
|
Thanks @jbrockmendel ! |
I'm much more confident about the other two BLD PRs (#26757, #26758) than this, largely because: