Skip to content

Tracking PR: intermediate state of completed str/bytes cleaning up PRs #2348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

YannickJadoul
Copy link
Collaborator

@YannickJadoul YannickJadoul commented Jul 31, 2020

Merged

Pending

Merged separately to master

Discarded

TODO

  • Create PR to give py::bytes a converting constructor (PYBIND11_OBJECT_CVT instead of PYBIND11_OBJECT) (@YannickJadoul)
  • Create PR to check if PYBIND11_OBJECT could call check_ (@YannickJadoul)
  • Change test_constructors() to check the changes in constructors (either converting or throwing an error on a non-bytes object)
  • Discuss PYBIND11_DISABLE_IMPLICIT_STR_FROM_BYTES (Changing pybind11::str to only hold PyUnicodeObject (NOT also bytes). #2380): do we a) discard it (keeping current behaviour), b) keep it (undocumented), c) document it (as opt-in), d) deprecate not using it?

@YannickJadoul
Copy link
Collaborator Author

(Rebased onto master, so watch out when using this branch locally, without updating.)

@rwgk rwgk force-pushed the str-bytes-cleanup branch 2 times, most recently from 65130b7 to dcc41ef Compare August 7, 2020 01:45
@rwgk rwgk force-pushed the str-bytes-cleanup branch 4 times, most recently from 1d553e4 to 0b49467 Compare August 16, 2020 17:24
rwgk and others added 7 commits August 17, 2020 10:55
* Rolling back PR #2340 change to tests/test_pytypes.py (only this one file).

The two other files changed with PR #2340 are not affected by this partial rollback.

This partial rollback enables cherry-picking a commit from PR #2380.

* test_constructors() fix for Python 2.

Preparation for changing `pybind11::str` to only hold `PyUnicodeObject` (NOT also `bytes`).

Currently test_constructors passes with Python 2 only because `pybind11::str` can also hold a Python 2 `PyStringObject` (or the equivalent `PyBytesObject` in Python 3). Changing the test to exercise conversions for `PyUnicodeObject` makes it consistent between Python 2 and 3, and removes this small obstacle to the planned `pybind11::str` change.

Tests for `bytes` conversions will be added separately.

* Adding test_constructors test for bytes, on top of cherry-picked commit from PR #2380.
…old `PyUnicodeObject` (NOT also `bytes`).

NO change in behavior. These additional tests are designed to clearly bring out behavior changes related to planned `pybind11::str` changes.
…tes`).

The corresponding behavior changes are captured by changes in the tests. A significant effort was made to keep the test diffs minimal but also comprehensive and easy to read.

Note: Unlike PR #2256 (dropped), this PR only changes exactly one behavior. The two other behavior changes discussed under PR #2256 are avoided here (1. disabling implicit decoding from bytes to unicode; 2. list_caster behavior change). Based on this PR, those can be easily implemented if and when desired.
@rwgk rwgk force-pushed the str-bytes-cleanup branch from 7ae6bb2 to 4fc3b1b Compare August 17, 2020 17:55
@rwgk
Copy link
Collaborator

rwgk commented Aug 18, 2020

I'm really sorry, but I have to abandon this branch due to severe lack of progress.

For context, I added my first comment on #2198 on Jun 13.

Following @YannickJadoul guidance, I spent an inordinate amount of time shifting the same small code fragments between branches and PRs.

More than two months after I first picked up #2198, and >2 weeks of net effort, @YannickJadoul doesn't even support merging the small test additions into master, to simply add missing coverage for existing behavior, that is meant to serve as a baseline for behavior changes. — I really have to pull the emergency brake now. The full conversation leading me to this decision is pasted below, with Yannick's permission.

Obviously, I need this work, or at least part of it, on stable, so I can actually use it in a production environment and other OSS projects. I will now move this work to a non-shared draft PR that I can locally merge on top of stable in the future. I will continue to offer my active support cherry-picking into master, to hopefully make the draft PR obsolete eventually.


Full conversation pasted here, as first suggested by me, then after some more back-and-forth requested by Yannick:

Full conversation

Ralf Grosse-Kunstleve 7:34 AM August 17, 2020
I went down a rabbit hole thinking the one and only isinstance in the Google codebase is the issue ... but no (edited)

Yannick Jadoul 7:35 AM
Huh, dang

Ralf Grosse-Kunstleve 9:09 AM
Here is the fix (code is OSS: pybind11_protobuf)

-        message->ParseFromString(str(d["serialized"]));
+        message->ParseFromString(d["serialized"].cast<bytes>());

9:10
The author accidentally put str, and it worked only because of the chimera nature.
9:10
Interesting second wrinkle: I cannot use bytes(...) because of the missing overload in stable.
9:11
Summary: fixing pybind11::str exposed a hidden bug.

Ralf Grosse-Kunstleve 10:55 AM
heads-up: I'll rebase str-bytes-cleanup on master now.

Ralf Grosse-Kunstleve 12:54 PM
uh oh ... unfortunately it looks like that error was only the tip of the tip of the iceberg 😞
12:55
I tried to submit the str-bytes-cleanup branch for global testing (millions of tests) and got kicked out in the triage step.

Yannick Jadoul 2:03 PM
Ah, amazing to hear that it finds bugs. But hmmmm, quite horrible to hear the follow up on that
2:04
So holding off on merging the next PR into str-bytes-cleanup?
2:05

Interesting second wrinkle: I cannot use bytes(...) because of the missing overload in stable.
That shouldn't have any influence on str-bytes-cleanup, though?

Ralf Grosse-Kunstleve 2:07 PM
the bytes constructor change is also in the branch

Ralf Grosse-Kunstleve 2:15 PM
I think we can go ahead merging more into the branch, but we also need to think about cherry-picking into master, especially the tests of current behavior. Ideally for any feature change or bug fix, the tests go into master first for existing behavior, followed by a second commit changing the behavior & adjusting tests. That way we can easily reason & reproduce in the future what exactly changed when.
Only visible to you

Slackbot 2:15 PM
Yannick Jadoul has paused their notifications for a bit, but they'll see your message when they're back.
If you need to get their attention right away, you can choose to notify them anyway.

Yannick Jadoul 2:17 PM
The tests for current behavior were merged into master, no? At least that one PR
2:18
But yes, would be good to finish up some stuff, take a step back and plan the rest :)

Ralf Grosse-Kunstleve 2:19 PM
Only one test (specific to raw_str).

Yannick Jadoul 2:20 PM
Ah, ok. I thought the rest was related to the changes and needed some of the changes
2:20
Like the bytes ctr
2:21
Do you still have any open PRs?

Ralf Grosse-Kunstleve 2:22 PM
About tests not on master: the change to test_constructors , to not rely on the chimera nature. And the new test_isinstance_string_types and test_pass_bytes_or_unicode_to_string_types.
2:22
No open PRs.
2:24
It would be really helpful to merge those tests (strictly against current behavior) into master, because then it's easier point to them, for current behavior. And it clearly separates them from the fixes/changes. (edited)

Yannick Jadoul 2:27 PM
I don't fully see how that would help, though? It's not really harder to point to another branch than master, I'd think?
2:27
And it's not the tests that are breaking stuff

Ralf Grosse-Kunstleve 2:29 PM
you have to point people to individual commits on the branch to see the before after; if on master you see "current" very easily
2:29
I think that's a value by itself

Yannick Jadoul 3:19 PM
Huh, I'm not getting that? How is that different from master? On master you'd also need to point to a specific commit for before/after, no?
3:20
To git, the branch name should make no difference
3:20
But maybe we should be more careful with the rebasing
3:21
(We could just probably just merge master into str-bytes-cleanup and only rebase at the very end. I can try out some things tomorrow.)

Ralf Grosse-Kunstleve 3:23 PM
picking out one easy thing: there is currently no test coverage for isinstance and isinstance. why is it not good to add test coverage to master?

Yannick Jadoul 3:29 PM
I'm not saying it couldn't work, but I don't currently understand why that's so urgent
3:29
Everything's so clean in its own branch, for now

Ralf Grosse-Kunstleve 3:32 PM
I've seen people who didn't want to write tests, but I cannot remember that anyone pushed back adding missing test coverage even when it was written already.

Yannick Jadoul 3:35 PM
Not pushing back on that; to me they're just part of the larger effort of cleaning up the str-related codebase, which I thought we were doing
3:36
And they're just as public/available as any commit on master, so as long as there's no commit being prepared, what's the point of merging it needlessly
3:37
It's in the main repo, it can be checked out, and CI runs on that branch. So what are we lacking?

Ralf Grosse-Kunstleve 11:40 AM August 18, 2020
Is it OK with you if I copy the conversation as a comment to #2348, starting from my "rabbit hole" comment? That will make it easiest for me to explain why the branch was a failed experiment.

Yannick Jadoul 12:13 PM
Euhm, sure. I'm reasonably baffled on why you're suddenly calling it a "failed experiment", but I don't think I've wrote anything I don't stand by, so go ahead.

Ralf Grosse-Kunstleve 12:14 PM
Thanks

Ralf Grosse-Kunstleve 12:46 PM
Pasting my draft comment (I decided I don't nee the whole conversation):
--------
I'm really sorry, but I have to abandon this branch due to severe lack of progress.
For context, I added my first comment on #2198 on Jun 13.
Following @YannickJadoul guidance, I spent an inordinate amount of time shifting the same small code fragments between branches and PRs.
More than two months after I first picked up #2198, and >2 weeks of net effort, @YannickJadoul doesn't even support merging the small test additions into master, to simply add missing coverage for existing behavior, that is meant to serve as a baseline for behavior changes (Yannick (with his permission): "I'm not saying it couldn't work, but I don't currently understand why that's so urgent"). I really have to pull the emergency brake now.
Obviously, I need this work, or at least part of it, on stable, so I can actually use it in a production environment and other OSS projects. I will now move this work to a non-shared draft PR that I can locally merge on top of stable in the future. I will continue to offer my active support cherry-picking into master, to hopefully make the draft PR obsolete eventually.
12:47
Please let me know if you feel I'm mis-representing the situation. I haven't posted it yet.
New

Yannick Jadoul 1:42 PM
Well, four things to that:
Yesterday, you were just talking about cherry-picking the tests. So there's no functionality in there for you to "actually use it in a production environment and other OSS projects", in our discussion of yesterday?
Second of all, I said a lot more than that. I kept on asking what part makes master so different from str-bytes-cleanup (there's https://github.com/pybind/pybind11/tree/str-bytes-cleanup and git clone [email protected]:pybind/pybind11.git -b str-bytes-cleanup as oneliners to work with that branch, rather than master) and what you couldn't achieve, and never got an answer to that. So yes, I do mind being quoted on that one line out of context; that's not with my permission. If you really want to use that quote and reasoning, I think you need to show the fuller discussion.
During those +2 months, there have been lots of improvements to the original proposal. You probably created twice as many PRs than actually got merged, and you and I still found better ways of fixing it, after we found out what actually went wrong (I found the check_ method, rather than fixing py::isinstance and creating another discrepancy; you found the cleaner #2390 after #2340 was already merged). So you can't claim a) all that time was wasted, and b) that the whole problem is so simple.
This won't be on stable until a release is made, anyway. Not up to me, but I think there's still quite a few features others want to get in to a new release? And given the fact that you are finding all these bugs in the Google code base and that this is breaking some existing code, this'll be part of a minor release bump.
1:43
That being said, I'm not stopping you from posting whatever you want. I just think you're making stuff a lot more complex than it now is (edited)
1:43
Let's not create 10 more PRs; that's where all the confusion stems from anyway
1:45
If we call this part finished, we could just draw a line under the current state, and get that in
1:45
My main concern is looking back in a year and trying to find "that one PR where xyz happened"
1:45
Which is why I tried to create a main hub

Ralf Grosse-Kunstleve 1:47 PM
Thanks Yannick, since you request it, I will add the full conversation to the post.

Yannick Jadoul 1:51 PM
That's potentially the least interesting part of everything I said, but fine, as you wish

EDIT (@YannickJadoul): put in <details> tag to take up less space.

@rwgk
Copy link
Collaborator

rwgk commented Aug 26, 2021

Cleanup in passing: This PR is completely stale, and I think all work was actually completed a long time ago.

@rwgk rwgk closed this Aug 26, 2021
@rwgk rwgk deleted the str-bytes-cleanup branch August 26, 2021 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants