Skip to content

Commit 76d2836

Browse files
committed
Merge remote-tracking branch 'upstream/master' into datetime-mergeasof-tolerance
2 parents b07be3c + 4fb853f commit 76d2836

File tree

307 files changed

+6997
-12106
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

307 files changed

+6997
-12106
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ dist
5757
# wheel files
5858
*.whl
5959
**/wheelhouse/*
60+
pip-wheel-metadata
6061
# coverage
6162
.coverage
6263
coverage.xml

.pre-commit-config.yaml

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,4 @@ repos:
1515
hooks:
1616
- id: isort
1717
language: python_venv
18-
- repo: https://github.com/asottile/seed-isort-config
19-
rev: v1.9.2
20-
hooks:
21-
- id: seed-isort-config
18+
exclude: ^pandas/__init__\.py$|^pandas/core/api\.py$

MANIFEST.in

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ include LICENSE
33
include RELEASE.md
44
include README.md
55
include setup.py
6+
include pyproject.toml
67

78
graft doc
89
prune doc/build
@@ -14,6 +15,7 @@ graft pandas
1415
global-exclude *.bz2
1516
global-exclude *.csv
1617
global-exclude *.dta
18+
global-exclude *.feather
1719
global-exclude *.gz
1820
global-exclude *.h5
1921
global-exclude *.html
@@ -23,7 +25,10 @@ global-exclude *.pickle
2325
global-exclude *.png
2426
global-exclude *.pyc
2527
global-exclude *.pyd
28+
global-exclude *.ods
29+
global-exclude *.odt
2630
global-exclude *.sas7bdat
31+
global-exclude *.sav
2732
global-exclude *.so
2833
global-exclude *.xls
2934
global-exclude *.xlsm

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ black:
1818
black . --exclude '(asv_bench/env|\.egg|\.git|\.hg|\.mypy_cache|\.nox|\.tox|\.venv|_build|buck-out|build|dist|setup.py)'
1919

2020
develop: build
21-
python setup.py develop
21+
python -m pip install --no-build-isolation -e .
2222

2323
doc:
2424
-rm -rf doc/build doc/source/generated

README.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -188,16 +188,17 @@ python setup.py install
188188

189189
or for installing in [development mode](https://pip.pypa.io/en/latest/reference/pip_install.html#editable-installs):
190190

191+
191192
```sh
192-
python setup.py develop
193+
python -m pip install --no-build-isolation -e .
193194
```
194195

195-
Alternatively, you can use `pip` if you want all the dependencies pulled
196-
in automatically (the `-e` option is for installing it in [development
197-
mode](https://pip.pypa.io/en/latest/reference/pip_install.html#editable-installs)):
196+
If you have `make`, you can also use `make develop` to run the same command.
197+
198+
or alternatively
198199

199200
```sh
200-
pip install -e .
201+
python setup.py develop
201202
```
202203

203204
See the full instructions for [installing from source](https://pandas.pydata.org/pandas-docs/stable/install.html#installing-from-source).
@@ -224,7 +225,7 @@ Most development discussion is taking place on github in this repo. Further, the
224225

225226
All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.
226227

227-
A detailed overview on how to contribute can be found in the **[contributing guide](https://dev.pandas.io/contributing.html)**. There is also an [overview](.github/CONTRIBUTING.md) on GitHub.
228+
A detailed overview on how to contribute can be found in the **[contributing guide](https://dev.pandas.io/docs/contributing.html)**. There is also an [overview](.github/CONTRIBUTING.md) on GitHub.
228229

229230
If you are simply looking to start working with the pandas codebase, navigate to the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) and start looking through interesting issues. There are a number of issues listed under [Docs](https://github.com/pandas-dev/pandas/issues?labels=Docs&sort=updated&state=open) and [good first issue](https://github.com/pandas-dev/pandas/issues?labels=good+first+issue&sort=updated&state=open) where you could start out.
230231

asv_bench/asv.conf.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,12 +50,13 @@
5050
"xlsxwriter": [],
5151
"xlrd": [],
5252
"xlwt": [],
53+
"odfpy": [],
5354
"pytest": [],
5455
// If using Windows with python 2.7 and want to build using the
5556
// mingw toolchain (rather than MSVC), uncomment the following line.
5657
// "libpython": [],
5758
},
58-
59+
"conda_channels": ["defaults", "conda-forge"],
5960
// Combinations of libraries/python versions can be excluded/included
6061
// from the set to test. Each entry is a dictionary containing additional
6162
// key-value pairs to include/exclude.

asv_bench/benchmarks/frame_methods.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -609,4 +609,15 @@ def time_dataframe_describe(self):
609609
self.df.describe()
610610

611611

612+
class SelectDtypes:
613+
params = [100, 1000]
614+
param_names = ["n"]
615+
616+
def setup(self, n):
617+
self.df = DataFrame(np.random.randn(10, n))
618+
619+
def time_select_dtypes(self, n):
620+
self.df.select_dtypes(include="int")
621+
622+
612623
from .pandas_vb_common import setup # noqa: F401 isort:skip

asv_bench/benchmarks/io/excel.py

Lines changed: 54 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,72 @@
11
from io import BytesIO
22

33
import numpy as np
4+
from odf.opendocument import OpenDocumentSpreadsheet
5+
from odf.table import Table, TableCell, TableRow
6+
from odf.text import P
47

58
from pandas import DataFrame, ExcelWriter, date_range, read_excel
69
import pandas.util.testing as tm
710

811

9-
class Excel:
12+
def _generate_dataframe():
13+
N = 2000
14+
C = 5
15+
df = DataFrame(
16+
np.random.randn(N, C),
17+
columns=["float{}".format(i) for i in range(C)],
18+
index=date_range("20000101", periods=N, freq="H"),
19+
)
20+
df["object"] = tm.makeStringIndex(N)
21+
return df
22+
23+
24+
class WriteExcel:
1025

1126
params = ["openpyxl", "xlsxwriter", "xlwt"]
1227
param_names = ["engine"]
1328

1429
def setup(self, engine):
15-
N = 2000
16-
C = 5
17-
self.df = DataFrame(
18-
np.random.randn(N, C),
19-
columns=["float{}".format(i) for i in range(C)],
20-
index=date_range("20000101", periods=N, freq="H"),
21-
)
22-
self.df["object"] = tm.makeStringIndex(N)
23-
self.bio_read = BytesIO()
24-
self.writer_read = ExcelWriter(self.bio_read, engine=engine)
25-
self.df.to_excel(self.writer_read, sheet_name="Sheet1")
26-
self.writer_read.save()
27-
self.bio_read.seek(0)
28-
29-
def time_read_excel(self, engine):
30-
read_excel(self.bio_read)
30+
self.df = _generate_dataframe()
3131

3232
def time_write_excel(self, engine):
33-
bio_write = BytesIO()
34-
bio_write.seek(0)
35-
writer_write = ExcelWriter(bio_write, engine=engine)
36-
self.df.to_excel(writer_write, sheet_name="Sheet1")
37-
writer_write.save()
33+
bio = BytesIO()
34+
bio.seek(0)
35+
writer = ExcelWriter(bio, engine=engine)
36+
self.df.to_excel(writer, sheet_name="Sheet1")
37+
writer.save()
38+
39+
40+
class ReadExcel:
41+
42+
params = ["xlrd", "openpyxl", "odf"]
43+
param_names = ["engine"]
44+
fname_excel = "spreadsheet.xlsx"
45+
fname_odf = "spreadsheet.ods"
46+
47+
def _create_odf(self):
48+
doc = OpenDocumentSpreadsheet()
49+
table = Table(name="Table1")
50+
for row in self.df.values:
51+
tr = TableRow()
52+
for val in row:
53+
tc = TableCell(valuetype="string")
54+
tc.addElement(P(text=val))
55+
tr.addElement(tc)
56+
table.addElement(tr)
57+
58+
doc.spreadsheet.addElement(table)
59+
doc.save(self.fname_odf)
60+
61+
def setup_cache(self):
62+
self.df = _generate_dataframe()
63+
64+
self.df.to_excel(self.fname_excel, sheet_name="Sheet1")
65+
self._create_odf()
66+
67+
def time_read_excel(self, engine):
68+
fname = self.fname_odf if engine == "odf" else self.fname_excel
69+
read_excel(fname, engine=engine)
3870

3971

4072
from ..pandas_vb_common import setup # noqa: F401 isort:skip

asv_bench/benchmarks/io/json.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -118,15 +118,15 @@ def setup(self, orient, frame):
118118
def time_to_json(self, orient, frame):
119119
getattr(self, frame).to_json(self.fname, orient=orient)
120120

121-
def mem_to_json(self, orient, frame):
121+
def peakmem_to_json(self, orient, frame):
122122
getattr(self, frame).to_json(self.fname, orient=orient)
123123

124124
def time_to_json_wide(self, orient, frame):
125125
base_df = getattr(self, frame).copy()
126126
df = concat([base_df.iloc[:100]] * 1000, ignore_index=True, axis=1)
127127
df.to_json(self.fname, orient=orient)
128128

129-
def mem_to_json_wide(self, orient, frame):
129+
def peakmem_to_json_wide(self, orient, frame):
130130
base_df = getattr(self, frame).copy()
131131
df = concat([base_df.iloc[:100]] * 1000, ignore_index=True, axis=1)
132132
df.to_json(self.fname, orient=orient)

asv_bench/benchmarks/package.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
"""
2+
Benchmarks for pandas at the package-level.
3+
"""
4+
import subprocess
5+
import sys
6+
7+
from pandas.compat import PY37
8+
9+
10+
class TimeImport:
11+
def time_import(self):
12+
if PY37:
13+
# on py37+ we the "-X importtime" usage gives us a more precise
14+
# measurement of the import time we actually care about,
15+
# without the subprocess or interpreter overhead
16+
cmd = [sys.executable, "-X", "importtime", "-c", "import pandas as pd"]
17+
p = subprocess.run(cmd, stderr=subprocess.PIPE)
18+
19+
line = p.stderr.splitlines()[-1]
20+
field = line.split(b"|")[-2].strip()
21+
total = int(field) # microseconds
22+
return total
23+
24+
cmd = [sys.executable, "-c", "import pandas as pd"]
25+
subprocess.run(cmd, stderr=subprocess.PIPE)

asv_bench/benchmarks/rolling.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,28 @@ def setup(self, constructor, window, dtype, method):
2121
def time_rolling(self, constructor, window, dtype, method):
2222
getattr(self.roll, method)()
2323

24+
def peakmem_rolling(self, constructor, window, dtype, method):
25+
getattr(self.roll, method)()
26+
27+
28+
class Apply:
29+
params = (
30+
["DataFrame", "Series"],
31+
[10, 1000],
32+
["int", "float"],
33+
[sum, np.sum, lambda x: np.sum(x) + 5],
34+
[True, False],
35+
)
36+
param_names = ["contructor", "window", "dtype", "function", "raw"]
37+
38+
def setup(self, constructor, window, dtype, function, raw):
39+
N = 10 ** 5
40+
arr = (100 * np.random.random(N)).astype(dtype)
41+
self.roll = getattr(pd, constructor)(arr).rolling(window)
42+
43+
def time_rolling(self, constructor, window, dtype, function, raw):
44+
self.roll.apply(function, raw=raw)
45+
2446

2547
class ExpandingMethods:
2648

asv_bench/benchmarks/stat_ops.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,12 +113,23 @@ def setup(self, method, use_bottleneck):
113113
nanops._USE_BOTTLENECK = use_bottleneck
114114
self.df = pd.DataFrame(np.random.randn(1000, 30))
115115
self.df2 = pd.DataFrame(np.random.randn(1000, 30))
116+
self.df_wide = pd.DataFrame(np.random.randn(1000, 200))
117+
self.df_wide_nans = self.df_wide.where(np.random.random((1000, 200)) < 0.9)
116118
self.s = pd.Series(np.random.randn(1000))
117119
self.s2 = pd.Series(np.random.randn(1000))
118120

119121
def time_corr(self, method, use_bottleneck):
120122
self.df.corr(method=method)
121123

124+
def time_corr_wide(self, method, use_bottleneck):
125+
self.df_wide.corr(method=method)
126+
127+
def time_corr_wide_nans(self, method, use_bottleneck):
128+
self.df_wide_nans.corr(method=method)
129+
130+
def peakmem_corr_wide(self, method, use_bottleneck):
131+
self.df_wide.corr(method=method)
132+
122133
def time_corr_series(self, method, use_bottleneck):
123134
self.s.corr(self.s2, method=method)
124135

azure-pipelines.yml

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ jobs:
104104
displayName: 'Running benchmarks'
105105
condition: true
106106
107-
- job: 'Docs'
107+
- job: 'Web_and_Docs'
108108
pool:
109109
vmImage: ubuntu-16.04
110110
timeoutInMinutes: 90
@@ -119,6 +119,11 @@ jobs:
119119
ci/setup_env.sh
120120
displayName: 'Setup environment and build pandas'
121121
122+
- script: |
123+
source activate pandas-dev
124+
python web/pandas_web.py web/pandas --target-path=web/build
125+
displayName: 'Build website'
126+
122127
- script: |
123128
source activate pandas-dev
124129
# Next we should simply have `doc/make.py --warnings-are-errors`, everything else is required because the ipython directive doesn't fail the build on errors (https://github.com/ipython/ipython/issues/11547)
@@ -128,15 +133,21 @@ jobs:
128133
displayName: 'Build documentation'
129134
130135
- script: |
131-
cd doc/build/html
136+
mkdir -p to_deploy/docs
137+
cp -r web/build/* to_deploy/
138+
cp -r doc/build/html/* to_deploy/docs/
139+
displayName: 'Merge website and docs'
140+
141+
- script: |
142+
cd to_deploy
132143
git init
133144
touch .nojekyll
134145
echo "dev.pandas.io" > CNAME
135146
printf "User-agent: *\nDisallow: /" > robots.txt
136147
git add --all .
137148
git config user.email "[email protected]"
138-
git config user.name "pandas-docs-bot"
139-
git commit -m "pandas documentation in master"
149+
git config user.name "pandas-bot"
150+
git commit -m "pandas web and documentation in master"
140151
displayName: 'Create git repo for docs build'
141152
condition : |
142153
and(not(eq(variables['Build.Reason'], 'PullRequest')),
@@ -160,10 +171,10 @@ jobs:
160171
eq(variables['Build.SourceBranch'], 'refs/heads/master'))
161172
162173
- script: |
163-
cd doc/build/html
174+
cd to_deploy
164175
git remote add origin [email protected]:pandas-dev/pandas-dev.github.io.git
165176
git push -f origin master
166-
displayName: 'Publish docs to GitHub pages'
177+
displayName: 'Publish web and docs to GitHub pages'
167178
condition : |
168179
and(not(eq(variables['Build.Reason'], 'PullRequest')),
169180
eq(variables['Build.SourceBranch'], 'refs/heads/master'))

ci/azure/posix.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,15 +60,21 @@ jobs:
6060
echo "Creating Environment"
6161
ci/setup_env.sh
6262
displayName: 'Setup environment and build pandas'
63+
6364
- script: |
6465
source activate pandas-dev
6566
ci/run_tests.sh
6667
displayName: 'Test'
68+
6769
- script: source activate pandas-dev && pushd /tmp && python -c "import pandas; pandas.show_versions();" && popd
70+
displayName: 'Build versions'
71+
6872
- task: PublishTestResults@2
6973
inputs:
7074
testResultsFiles: 'test-data-*.xml'
7175
testRunTitle: ${{ format('{0}-$(CONDA_PY)', parameters.name) }}
76+
displayName: 'Publish test results'
77+
7278
- powershell: |
7379
$junitXml = "test-data-single.xml"
7480
$(Get-Content $junitXml | Out-String) -match 'failures="(.*?)"'
@@ -94,6 +100,7 @@ jobs:
94100
Write-Error "$($matches[1]) tests failed"
95101
}
96102
displayName: 'Check for test failures'
103+
97104
- script: |
98105
source activate pandas-dev
99106
python ci/print_skipped.py

0 commit comments

Comments
 (0)