Skip to content

DOC: expanding comparison with R section #12472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

leifwalsh
Copy link
Contributor

This is the beginning of a quick reference section. It's incomplete,
just did a rough translation of
http://nbviewer.jupyter.org/urls/gist.githubusercontent.com/TomAugspurger/6e052140eaa5fdb6e8c0/raw/811585624e843f3f80b9b6fe89e18119d7d2d73c/dplyr_pandas.ipynb
into tables. Should try to get some R experts to comment, and it would
be nice to have the pandas versions link to docs for the functions being
used, but I'm terrible at reStructuredText and gave up for the moment.

This is the beginning of a quick reference section.  It's incomplete,
just did a rough translation of
http://nbviewer.jupyter.org/urls/gist.githubusercontent.com/TomAugspurger/6e052140eaa5fdb6e8c0/raw/811585624e843f3f80b9b6fe89e18119d7d2d73c/dplyr_pandas.ipynb
into tables.  Should try to get some R experts to comment, and it would
be nice to have the pandas versions link to docs for the functions being
used, but I'm terrible at reStructuredText and gave up for the moment.
@TomAugspurger
Copy link
Contributor

Thanks for this. There's a slightly updated version here, but I can't really remember what changed.

reST can be a bit of a pain. For linking to methods, you can use e.g.

:meth:`~pandas.DataFrame.sample`.

@TomAugspurger TomAugspurger added this to the 0.18.1 milestone Feb 26, 2016
@leifwalsh
Copy link
Contributor Author

Yeah, I tried :meth: once and didn't like it.

But seriously, I couldn't get it to format well, or deal with the example
arguments. I'll play with it some more eventually.
On Fri, Feb 26, 2016 at 16:40 Tom Augspurger [email protected]
wrote:

Thanks for this. There's a slightly updated version here
https://gist.github.com/TomAugspurger/6e052140eaa5fdb6e8c0, but I can't
really remember what changed.

reST can be a bit of a pain. For linking to methods, you can use e.g.

:meth:~pandas.DataFrame.sample.


Reply to this email directly or view it on GitHub
#12472 (comment).

Cheers,
Leif

``filter(df, col1 == 1, col2 == 1)`` ``df.query('col1 == 1 & col2 == 1')``
``df[df$col1 == 1 & df$col2 == 1,]`` ``df[(df.col1 == 1) & (df.col2 == 1)]``
``select(df, col1, col2)`` ``df[['col1', 'col2']]``
``select(df, col1:col3)`` No one-line equivalent, but see [#select_range]_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this equivalent to df.loc[:, 'col1':'col3'] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche based on my understanding of python, 'col1':'col3' would have to parse correctly as a range, and I don't think it does. But I'd be happy to be wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does work in this case, I've updated that notebook here. I can never remember the rules on slicing unsorted indexes, so I prefer to be explicit. For the comparison though I think it's fine to use 'col1':'col3'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the labels are actual column names, this works perfectly as expected (just from the one label to the other, regardless of the order). It's only when you use labels that are not included, that the index needs to be sorted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there you go

=========================================== ===========================================
R pandas
=========================================== ===========================================
``arrange(df, col1, col2)`` ``df.sort(['col1', 'col2'])``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort is deprecated, pls change it to sort_values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jreback
Copy link
Contributor

jreback commented Apr 17, 2016

can you update

 - sort -> sort_values
 - unique -> drop_duplicates
@leifwalsh
Copy link
Contributor Author

@jreback thanks for the poke

@@ -55,7 +55,7 @@ R pandas
``select(df, col1, col2)`` ``df[['col1', 'col2']]``
``select(df, col1:col3)`` No one-line equivalent, but see [#select_range]_
``select(df, -(col1:col3))`` ``df.drop(cols_to_drop, axis=1)`` but see [#select_range]_
``distinct(select(df, col1))`` ``df.col1.unique()``
``distinct(select(df, col1))`` ``df[['col1']].drop_duplicates()``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in R does this return a different shape (e.g. Series/DataFrame distinction) if you provide 1 vs multiple columns?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, let me see if I can reproduce.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
> distinct(select(mtcars, gear))
  gear
1    4
2    3
3    5
> distinct(select(mtcars, gear, carb))
   gear carb
1     4    4
2     4    1
3     3    1
4     3    2
5     3    4
6     4    2
7     3    3
8     5    2
9     5    4
10    5    6
11    5    8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I think it's the same type either way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm that's interesting. ok best then to show the frame result then (which i think is what you did) (even for 1 column)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the simplest way I could find to select a single series in R:

> distinct(select(mtcars, gear))$gear
[1] 4 3 5

@jreback jreback closed this in 1e0b228 Apr 27, 2016
@jreback
Copy link
Contributor

jreback commented Apr 27, 2016

@leifwalsh thanks.

I merged this in. Pls have a look at the built docs (prob take a few hours). http://pandas-docs.github.io/pandas-docs-travis/comparison_with_r.html

not really sure if there is a way to make the 3 tables (sorting, transforming, grouping) be the same width .....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: expanding comparison with R section
5 participants