Skip to content

DOC: 'replace' docstring lacking / too complex #17673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Sep 25, 2017 · 2 comments
Closed

DOC: 'replace' docstring lacking / too complex #17673

jorisvandenbossche opened this issue Sep 25, 2017 · 2 comments
Labels

Comments

@jorisvandenbossche
Copy link
Member

I think the replace docstring is lacking in many ways (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html):

  • The explanation of to_replace keyword is both way too complex and lacking an explanation of the simple cases:
    • the most simple case of a scalar value is not clearly mentioned (like df.replace(to_replace=0, replace=1), it is mentioned in the 'str' explanation, but it is not specific to strings)
    • the simplest dict case of df.replace({to_replace: replacement}) is not mentioned (the dict explanation starts with explanation of nested dicts)
    • I would personally rewrite this whole explanation of this keyword, start with basic cases, and only after that (or in the notes) explain the complex cases.
  • There is a reference to the examples section for "examples of each of those", but there is no examples section. We should add one.
  • In the 'see also' section it references reindex, asfreq and fillna. fillna is fine, but I fail to see the link with the first two. I would rather add a reference to where to replace values based on a boolean condition (and the 'see also' should not just refer to the other methods, but also include a sentence on why / the difference)
  • The docstring also uses NDFrame, and this should never be in a public docstring (failing substituion of docstring in generic)
  • I would personally also write separate docstrings for the series and dataframe case. This will give some duplication, but I think this gives room to simplify the docstring (or certainly for the simpler Series.replace case). (xref Series.replace and DataFrame.replace have same docstring? #13852)

See the tutorial docs (https://pandas.pydata.org/pandas-docs/stable/missing_data.html#replacing-generic-values) with some actual examples.

Underlying reason is that this function of course can do way too many things at the same time (or the same things in too many different ways) ... (orthogonal to this, we could maybe also think if certain functionality could be moved into its own function).

@jreback
Copy link
Contributor

jreback commented Feb 4, 2018

should re-evaluate in light of #18100

@simonjayhawkins
Copy link
Member

should re-evaluate in light of #18100

@jorisvandenbossche The latest docs look like they cover all the documentation issues raised in the OP.

orthogonal to this, we could maybe also think if certain functionality could be moved into its own function

opened #33302 for that discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants