-
-
Notifications
You must be signed in to change notification settings - Fork 142
ENH: Synchronize pickle with upstream #206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Align APIs Add tests
Failure is likely a bug in pyright |
ReadPickleBuffer, | ||
StorageOptions, | ||
WriteBuffer, | ||
) | ||
|
||
def to_pickle( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since pandas.io.pickle.to_pickle()
is not public, we should delete this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still need to handle removal of to_pickle()
from this file (and then the associated tests)
tests/test_io.py
Outdated
os.unlink(file.name) | ||
|
||
with tempfile.NamedTemporaryFile(delete=False) as file: | ||
check(assert_type(to_pickle(DF, file), None), type(None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since to_pickle()
is not public, no need to test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is listed in pandas/io/api.py so I assume this makes it public even it not on the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is listed in pandas/io/api.py so I assume this makes it public even it not on the docs.
There's a few schools of thought here:
- We trim down
pandas-stubs
so that it only type checks what is in the documented public API - If a function or class is "public" in the sense that it does not begin with an underscore but is not documented, we create a stub for it and test it.
- If a function or class is "public" in the sense that it does not begin with an underscore and is not documented, we are agnostic on creating a stub or testing that stub.
So far, @twoertwein and I have been leaning towards (1). With pandas.io.api.to_pickle()
, you are proposing (2).
@twoertwein What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more of what is the "API". To me it is something like
- Documented as part of the docs
- Imported into an
api.py
file or the top level `init.py file.
I think 2 is too broad because there hasn't been enough effort in pandas to _
modules, classes and methods.
In short, if it seems to be part of an API, then it is reasonable to include it.
A related point is documenting public methods of classes that appear parrtially in the docs, something like Klass.method(arg1, arg2)
. Should public of Klass
be documented, or just Klass.__init__
and Klass.method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal is that everything that is meant to be public (which is often unclear) is documented and in pandas-stubs. Personally, I think the best way is to remove any symbol from the stubs that is not meant to be public.
- If it seems reasonable that it is meant to be public, it might be a better user experience, if we first open an issue at pandas before potentially removing it.
- If we remove too much, we get user feedback and can then create an issue at pandas.
I believe typeshed uses # undocumented
to indicate which parts of their stubs are technically not documented. if we want to keep more in here, we could follow that approach. Luckily we have rather good connections to pandas :) so we might just open an issue there :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some grey zones: a private super class but the inherited methods are public in a public child class: I would keep the parent class (in the long-term, I would like if pandas-stubs aligns with pandas), define __all__
but exclude the class from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more of what is the "API". To me it is something like
- Documented as part of the docs
- Imported into an
api.py
file or the top level `init.py file.
I think that's a fair definition. So using that, then to_pickle()
is public, but it appears to be undocumented, so someone should create an issue over in pandas repo to indicate that it should be documented. In that case, we'll get a reaction of "wait - that shouldn't be public", or "Yes, let's document".
Looking at the source pandas.to_pickle()
is what is really public, so I think you should change the tests to use pandas.to_pickle()
rather than pandas.io.api.to_pickle()
. Same goes for pandas.read_pickle()
versus pandas.io.api.read_pickle()
If you can create a small test case for |
I don't think this is a bug in pyright. Typeshed says that edit: on the other side, the argument is optional, so maybe it is a bug in pyright. |
I think the arguments have to be consistent, even if it is optional. So if we just change It's not getting caught in pandas testing because the CI for pyright doesn't type check the testing code. |
I opened pandas-dev/pandas#48144 to fix this issue. You can simply replace the return type of |
Correct definition Use ensure_clean
Ensure to_pickle is tested since it appears in pandas/io/api
Import read_pickle from main Fix merge conflicts
Remove extra def Test on series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't test or have a stub for pd.io.api.to_pickle()
I think I brought that up somewhere in this PR
Also need to resolve conflicts |
It has an open issue on pandas. I was leaning towards anything in API being part of the API irrespective of whether it is in the docs. Path forward would be to deprecate from API in pandas, then drop here. |
can you link to that issue here? I know @twoertwein created an issue to ask how we want to handle the API in general pandas-dev/pandas#48186 , but I think creating a specific issue for |
have to resolve conflicts |
ReadPickleBuffer, | ||
StorageOptions, | ||
WriteBuffer, | ||
) | ||
|
||
def to_pickle( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still need to handle removal of to_pickle()
from this file (and then the associated tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to remove to_pickle()
from io/pickle.pyi
I just noticed that |
green. |
It may be a top-level function, but it is not documented, so I think that's an error in the implementation. Can you create an issue in pandas for this? |
I'll let @twoertwein provide his opinion on this to resolve this. Summary:
What do you think @twoertwein ? |
Creating an issue/PR at pandas is probably best. In the meantime, I wouldn't mind merging this PR. Can address this small inconsistency in a follow-up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @bashtage
Align APIs
Add tests
assert_type()
to assert the type of any return value