ENH: Synchronize pickle with upstream #206

bashtage · 2022-08-18T11:25:34Z

Align APIs
Add tests

Tests added: Please use assert_type() to assert the type of any return value

Align APIs Add tests

bashtage · 2022-08-18T11:25:49Z

Failure is likely a bug in pyright

pandas-stubs/core/frame.pyi

Dr-Irv · 2022-08-18T12:16:13Z

pandas-stubs/io/pickle.pyi

+    ReadPickleBuffer,
+    StorageOptions,
+    WriteBuffer,
+)

 def to_pickle(


Since pandas.io.pickle.to_pickle() is not public, we should delete this here.

still need to handle removal of to_pickle() from this file (and then the associated tests)

Dr-Irv · 2022-08-18T12:17:36Z

tests/test_io.py

+        os.unlink(file.name)
+
+    with tempfile.NamedTemporaryFile(delete=False) as file:
+        check(assert_type(to_pickle(DF, file), None), type(None))


Since to_pickle() is not public, no need to test it.

It is listed in pandas/io/api.py so I assume this makes it public even it not on the docs.

It is listed in pandas/io/api.py so I assume this makes it public even it not on the docs.

There's a few schools of thought here:

We trim down pandas-stubs so that it only type checks what is in the documented public API

If a function or class is "public" in the sense that it does not begin with an underscore but is not documented, we create a stub for it and test it.

If a function or class is "public" in the sense that it does not begin with an underscore and is not documented, we are agnostic on creating a stub or testing that stub.

So far, @twoertwein and I have been leaning towards (1). With pandas.io.api.to_pickle(), you are proposing (2).

@twoertwein What are your thoughts?

I think it is more of what is the "API". To me it is something like

Documented as part of the docs

Imported into an api.py file or the top level `init.py file.

I think 2 is too broad because there hasn't been enough effort in pandas to _ modules, classes and methods.

In short, if it seems to be part of an API, then it is reasonable to include it.

A related point is documenting public methods of classes that appear parrtially in the docs, something like Klass.method(arg1, arg2). Should public of Klass be documented, or just Klass.__init__ and Klass.method

My goal is that everything that is meant to be public (which is often unclear) is documented and in pandas-stubs. Personally, I think the best way is to remove any symbol from the stubs that is not meant to be public.

If it seems reasonable that it is meant to be public, it might be a better user experience, if we first open an issue at pandas before potentially removing it.

If we remove too much, we get user feedback and can then create an issue at pandas.

I believe typeshed uses # undocumented to indicate which parts of their stubs are technically not documented. if we want to keep more in here, we could follow that approach. Luckily we have rather good connections to pandas :) so we might just open an issue there :)

There are some grey zones: a private super class but the inherited methods are public in a public child class: I would keep the parent class (in the long-term, I would like if pandas-stubs aligns with pandas), define __all__ but exclude the class from it.

I think it is more of what is the "API". To me it is something like

Documented as part of the docs

Imported into an api.py file or the top level `init.py file.

I think that's a fair definition. So using that, then to_pickle() is public, but it appears to be undocumented, so someone should create an issue over in pandas repo to indicate that it should be documented. In that case, we'll get a reaction of "wait - that shouldn't be public", or "Yes, let's document".

Looking at the source pandas.to_pickle() is what is really public, so I think you should change the tests to use pandas.to_pickle() rather than pandas.io.api.to_pickle() . Same goes for pandas.read_pickle() versus pandas.io.api.read_pickle()

Dr-Irv · 2022-08-18T12:19:20Z

Failure is likely a bug in pyright

If you can create a small test case for pyright, and submit it to them, they are usually very fast at doing fixes.

twoertwein · 2022-08-18T16:05:11Z

I don't think this is a bug in pyright.

Typeshed says that readline takes an argument https://github.com/python/typeshed/blob/1e1a5868936145392d421e3e44258d9d0863ce4c/stdlib/tempfile.pyi#L211 but ReadPickleBuffer expects no argument for readline. I think pandas has the same issue.

edit: on the other side, the argument is optional, so maybe it is a bug in pyright.

Dr-Irv · 2022-08-18T16:12:37Z

I don't think this is a bug in pyright.

Typeshed says that readline takes an argument https://github.com/python/typeshed/blob/1e1a5868936145392d421e3e44258d9d0863ce4c/stdlib/tempfile.pyi#L211 but ReadPickleBuffer expects no argument for readline. I think pandas has the same issue.

edit: on the other side, the argument is optional, so maybe it is a bug in pyright.

I think the arguments have to be consistent, even if it is optional. So if we just change readline() in the defn of ReadPickleBuffer to match the arguments in typeshed, then I think we will be fine.

It's not getting caught in pandas testing because the CI for pyright doesn't type check the testing code.

twoertwein · 2022-08-19T00:17:16Z

I opened pandas-dev/pandas#48144 to fix this issue. You can simply replace the return type of readline with bytes

Correct definition Use ensure_clean

Ensure to_pickle is tested since it appears in pandas/io/api

Import read_pickle from main Fix merge conflicts

Remove extra def Test on series

Dr-Irv

We shouldn't test or have a stub for pd.io.api.to_pickle() I think I brought that up somewhere in this PR

Dr-Irv · 2022-08-22T16:22:00Z

Also need to resolve conflicts

bashtage · 2022-08-22T16:23:27Z

We shouldn't test or have a stub for pd.io.api.to_pickle() I think I brought that up somewhere in this PR

It has an open issue on pandas. I was leaning towards anything in API being part of the API irrespective of whether it is in the docs. Path forward would be to deprecate from API in pandas, then drop here.

Dr-Irv · 2022-08-22T16:34:30Z

We shouldn't test or have a stub for pd.io.api.to_pickle() I think I brought that up somewhere in this PR

It has an open issue on pandas. I was leaning towards anything in API being part of the API irrespective of whether it is in the docs. Path forward would be to deprecate from API in pandas, then drop here.

can you link to that issue here? I know @twoertwein created an issue to ask how we want to handle the API in general pandas-dev/pandas#48186 , but I think creating a specific issue for pandas.io.api.to_pickle() over there would be good as well.

Dr-Irv · 2022-08-22T16:34:39Z

have to resolve conflicts

Dr-Irv · 2022-09-03T14:38:16Z

pandas-stubs/io/pickle.pyi

+    ReadPickleBuffer,
+    StorageOptions,
+    WriteBuffer,
+)

 def to_pickle(


still need to handle removal of to_pickle() from this file (and then the associated tests)

Dr-Irv

Have to remove to_pickle() from io/pickle.pyi

bashtage · 2022-09-05T14:47:17Z

I just noticed that to_pickle is a top-level function (i.e., pd.to_pickle), so I think it needs to be deprecated in pandas before it should be removed here. I think this is ready.

bashtage · 2022-09-05T14:53:41Z

green.

Dr-Irv · 2022-09-05T15:23:07Z

I just noticed that to_pickle is a top-level function (i.e., pd.to_pickle), so I think it needs to be deprecated in pandas before it should be removed here. I think this is ready.

It may be a top-level function, but it is not documented, so I think that's an error in the implementation.

Can you create an issue in pandas for this?

Dr-Irv · 2022-09-05T15:26:35Z

I just noticed that to_pickle is a top-level function (i.e., pd.to_pickle), so I think it needs to be deprecated in pandas before it should be removed here. I think this is ready.

It may be a top-level function, but it is not documented, so I think that's an error in the implementation.

Can you create an issue in pandas for this?

I'll let @twoertwein provide his opinion on this to resolve this. Summary:

pd.to_pickle() is in the API, but not documented.
DataFrame.to_pickle() and Series.to_pickle() are documented and in the API.
In this PR, @bashtage has maintained a stub for pd.to_pickle().
In my opinion, as it is not documented, we shouldn't provide a stub.

What do you think @twoertwein ?

twoertwein · 2022-09-05T15:32:43Z

Creating an issue/PR at pandas is probably best.

In the meantime, I wouldn't mind merging this PR. Can address this small inconsistency in a follow-up.

Dr-Irv

thanks @bashtage

ENH: Synchronize pickle with upstream

2c38ee9

Align APIs Add tests

Dr-Irv requested changes Aug 18, 2022

View reviewed changes

bashtage added 6 commits August 19, 2022 09:26

Merge remote-tracking branch 'upstream/main' into io-pickle

55e9136

BUG: Correct ReadPickleBuffer

ad181d8

Correct definition Use ensure_clean

MAINT: Clean up to_pickle testing

c94ce6c

Ensure to_pickle is tested since it appears in pandas/io/api

CLN: Import from main namespace

d5d59c7

Import read_pickle from main Fix merge conflicts

BUG: Restore read_pickle

3b9073b

CLN: Remove to_pickle from frame

a0c42c9

Remove extra def Test on series

Dr-Irv reviewed Aug 22, 2022

View reviewed changes

MAINT: Merge main

09edcbc

bashtage added 2 commits August 30, 2022 09:49

MAINT: Merge main

8c75f7a

Merge remote-tracking branch 'upstream/main' into io-pickle

3166a55

Dr-Irv requested changes Sep 3, 2022

View reviewed changes

Kevin Sheppard added 3 commits September 4, 2022 01:09

MAINT: Merge master

7a1b4ee

MAINT: Remove extra def

bb42347

MAINT: Merge master

99e4203

Dr-Irv requested changes Sep 5, 2022

View reviewed changes

MAINT: Merge main

eda7945

bashtage force-pushed the io-pickle branch from d37d55f to eda7945 Compare September 5, 2022 14:45

Dr-Irv approved these changes Sep 5, 2022

View reviewed changes

Dr-Irv merged commit 2f2289c into pandas-dev:main Sep 5, 2022

twoertwein mentioned this pull request Sep 5, 2022

DOC/DEPR: pandas.to_pickle pandas-dev/pandas#48402

Open

bashtage deleted the io-pickle branch September 15, 2022 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Synchronize pickle with upstream #206

ENH: Synchronize pickle with upstream #206

bashtage commented Aug 18, 2022

bashtage commented Aug 18, 2022

Dr-Irv Aug 18, 2022

Dr-Irv Sep 3, 2022

Dr-Irv Aug 18, 2022

bashtage Aug 19, 2022

Dr-Irv Aug 19, 2022

bashtage Aug 19, 2022

twoertwein Aug 19, 2022

twoertwein Aug 19, 2022

Dr-Irv Aug 19, 2022

Dr-Irv commented Aug 18, 2022

twoertwein commented Aug 18, 2022 •

edited

Loading

Dr-Irv commented Aug 18, 2022

twoertwein commented Aug 19, 2022

Dr-Irv left a comment

Dr-Irv commented Aug 22, 2022

bashtage commented Aug 22, 2022

Dr-Irv commented Aug 22, 2022

Dr-Irv commented Aug 22, 2022

Dr-Irv Sep 3, 2022

Dr-Irv left a comment

bashtage commented Sep 5, 2022

bashtage commented Sep 5, 2022

Dr-Irv commented Sep 5, 2022

Dr-Irv commented Sep 5, 2022

twoertwein commented Sep 5, 2022

Dr-Irv left a comment

ENH: Synchronize pickle with upstream #206

ENH: Synchronize pickle with upstream #206

Conversation

bashtage commented Aug 18, 2022

bashtage commented Aug 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dr-Irv commented Aug 18, 2022

twoertwein commented Aug 18, 2022 • edited Loading

Dr-Irv commented Aug 18, 2022

twoertwein commented Aug 19, 2022

Dr-Irv left a comment

Choose a reason for hiding this comment

Dr-Irv commented Aug 22, 2022

bashtage commented Aug 22, 2022

Dr-Irv commented Aug 22, 2022

Dr-Irv commented Aug 22, 2022

Choose a reason for hiding this comment

Dr-Irv left a comment

Choose a reason for hiding this comment

bashtage commented Sep 5, 2022

bashtage commented Sep 5, 2022

Dr-Irv commented Sep 5, 2022

Dr-Irv commented Sep 5, 2022

twoertwein commented Sep 5, 2022

Dr-Irv left a comment

Choose a reason for hiding this comment

twoertwein commented Aug 18, 2022 •

edited

Loading