Skip to content

Commit 96da473

Browse files
author
Matt Roeschke
committed
Merge remote-tracking branch 'upstream/master' into timestamp_tz_constructor_depr
2 parents 656beff + e98032d commit 96da473

32 files changed

+1227
-727
lines changed

doc/source/contributing.rst

+6-13
Original file line numberDiff line numberDiff line change
@@ -591,21 +591,14 @@ run this slightly modified command::
591591

592592
git diff master --name-only -- "*.py" | grep "pandas/" | xargs flake8
593593

594-
Note that on Windows, these commands are unfortunately not possible because
595-
commands like ``grep`` and ``xargs`` are not available natively. To imitate the
596-
behavior with the commands above, you should run::
594+
Windows does not support the ``grep`` and ``xargs`` commands (unless installed
595+
for example via the `MinGW <http://www.mingw.org/>`__ toolchain), but one can
596+
imitate the behaviour as follows::
597597

598-
git diff master --name-only -- "*.py"
598+
for /f %i in ('git diff upstream/master --name-only ^| findstr pandas/') do flake8 %i
599599

600-
This will list all of the Python files that have been modified. The only ones
601-
that matter during linting are any whose directory filepath begins with "pandas."
602-
For each filepath, copy and paste it after the ``flake8`` command as shown below:
603-
604-
flake8 <python-filepath>
605-
606-
Alternatively, you can install the ``grep`` and ``xargs`` commands via the
607-
`MinGW <http://www.mingw.org/>`__ toolchain, and it will allow you to run the
608-
commands above.
600+
This will also get all the files being changed by the PR (and within the
601+
``pandas/`` folder), and run ``flake8`` on them one after the other.
609602

610603
.. _contributing.import-formatting:
611604

doc/source/whatsnew/v0.24.0.rst

+49-2
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,8 @@ New features
2424
the user to override the engine's default behavior to include or omit the
2525
dataframe's indexes from the resulting Parquet file. (:issue:`20768`)
2626
- :meth:`DataFrame.corr` and :meth:`Series.corr` now accept a callable for generic calculation methods of correlation, e.g. histogram intersection (:issue:`22684`)
27-
27+
- :func:`DataFrame.to_string` now accepts ``decimal`` as an argument, allowing
28+
the user to specify which decimal separator should be used in the output. (:issue:`23614`)
2829

2930
.. _whatsnew_0240.enhancements.extension_array_operators:
3031

@@ -183,6 +184,47 @@ array, but rather an ``ExtensionArray``:
183184
This is the same behavior as ``Series.values`` for categorical data. See
184185
:ref:`whatsnew_0240.api_breaking.interval_values` for more.
185186

187+
.. _whatsnew_0240.enhancements.join_with_two_multiindexes:
188+
189+
Joining with two multi-indexes
190+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
191+
192+
:func:`Datafame.merge` and :func:`Dataframe.join` can now be used to join multi-indexed ``Dataframe`` instances on the overlaping index levels (:issue:`6360`)
193+
194+
See the :ref:`Merge, join, and concatenate
195+
<merging.Join_with_two_multi_indexes>` documentation section.
196+
197+
.. ipython:: python
198+
199+
index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
200+
('K1', 'X2')],
201+
names=['key', 'X'])
202+
203+
204+
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
205+
'B': ['B0', 'B1', 'B2']},
206+
index=index_left)
207+
208+
209+
index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
210+
('K2', 'Y2'), ('K2', 'Y3')],
211+
names=['key', 'Y'])
212+
213+
214+
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
215+
'D': ['D0', 'D1', 'D2', 'D3']},
216+
index=index_right)
217+
218+
219+
left.join(right)
220+
221+
For earlier versions this can be done using the following.
222+
223+
.. ipython:: python
224+
225+
pd.merge(left.reset_index(), right.reset_index(),
226+
on=['key'], how='inner').set_index(['key', 'X', 'Y'])
227+
186228
.. _whatsnew_0240.enhancements.rename_axis:
187229

188230
Renaming names in a MultiIndex
@@ -961,6 +1003,7 @@ Other API Changes
9611003
- :class:`DateOffset` attribute `_cacheable` and method `_should_cache` have been removed (:issue:`23118`)
9621004
- Comparing :class:`Timedelta` to be less or greater than unknown types now raises a ``TypeError`` instead of returning ``False`` (:issue:`20829`)
9631005
- :meth:`Index.hasnans` and :meth:`Series.hasnans` now always return a python boolean. Previously, a python or a numpy boolean could be returned, depending on circumstances (:issue:`23294`).
1006+
- The order of the arguments of :func:`DataFrame.to_html` and :func:`DataFrame.to_string` is rearranged to be consistent with each other. (:issue:`23614`)
9641007

9651008
.. _whatsnew_0240.deprecations:
9661009

@@ -981,6 +1024,7 @@ Deprecations
9811024
- The ``fastpath`` keyword of the different Index constructors is deprecated (:issue:`23110`).
9821025
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have deprecated the ``errors`` argument in favor of the ``nonexistent`` argument (:issue:`8917`)
9831026
- The class ``FrozenNDArray`` has been deprecated. When unpickling, ``FrozenNDArray`` will be unpickled to ``np.ndarray`` once this class is removed (:issue:`9031`)
1027+
- The methods :meth:`DataFrame.update` and :meth:`Panel.update` have deprecated the ``raise_conflict=False|True`` keyword in favor of ``errors='ignore'|'raise'`` (:issue:`23585`)
9841028
- Deprecated the `nthreads` keyword of :func:`pandas.read_feather` in favor of
9851029
`use_threads` to reflect the changes in pyarrow 0.11.0. (:issue:`23053`)
9861030
- :func:`pandas.read_excel` has deprecated accepting ``usecols`` as an integer. Please pass in a list of ints from 0 to ``usecols`` inclusive instead (:issue:`23527`)
@@ -1320,7 +1364,9 @@ Notice how we now instead output ``np.nan`` itself instead of a stringified form
13201364
- :func:`read_sas()` will correctly parse sas7bdat files with many columns (:issue:`22628`)
13211365
- :func:`read_sas()` will correctly parse sas7bdat files with data page types having also bit 7 set (so page type is 128 + 256 = 384) (:issue:`16615`)
13221366
- Bug in :meth:`detect_client_encoding` where potential ``IOError`` goes unhandled when importing in a mod_wsgi process due to restricted access to stdout. (:issue:`21552`)
1323-
- Bug in :func:`to_string()` that broke column alignment when ``index=False`` and width of first column's values is greater than the width of first column's header (:issue:`16839`, :issue:`13032`)
1367+
- Bug in :func:`to_html()` with ``index=False`` misses truncation indicators (...) on truncated DataFrame (:issue:`15019`, :issue:`22783`)
1368+
- Bug in :func:`DataFrame.to_string()` that broke column alignment when ``index=False`` and width of first column's values is greater than the width of first column's header (:issue:`16839`, :issue:`13032`)
1369+
- Bug in :func:`DataFrame.to_string()` that caused representations of :class:`DataFrame` to not take up the whole window (:issue:`22984`)
13241370
- Bug in :func:`DataFrame.to_csv` where a single level MultiIndex incorrectly wrote a tuple. Now just the value of the index is written (:issue:`19589`).
13251371
- Bug in :meth:`HDFStore.append` when appending a :class:`DataFrame` with an empty string column and ``min_itemsize`` < 8 (:issue:`12242`)
13261372
- Bug in :meth:`read_csv()` in which :class:`MultiIndex` index names were being improperly handled in the cases when they were not provided (:issue:`23484`)
@@ -1373,6 +1419,7 @@ Reshaping
13731419
- Bug in :func:`pandas.concat` when concatenating a multicolumn DataFrame with tz-aware data against a DataFrame with a different number of columns (:issue:`22796`)
13741420
- Bug in :func:`merge_asof` where confusing error message raised when attempting to merge with missing values (:issue:`23189`)
13751421
- Bug in :meth:`DataFrame.nsmallest` and :meth:`DataFrame.nlargest` for dataframes that have a :class:`MultiIndex` for columns (:issue:`23033`).
1422+
- Bug in :meth:`DataFrame.append` with a :class:`Series` with a dateutil timezone would raise a ``TypeError`` (:issue:`23682`)
13761423

13771424
.. _whatsnew_0240.bug_fixes.sparse:
13781425

doc/sphinxext/contributors.py

+20-11
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
"""
1111
from docutils import nodes
1212
from docutils.parsers.rst import Directive
13+
import git
1314

1415
from announce import build_components
1516

@@ -19,17 +20,25 @@ class ContributorsDirective(Directive):
1920
name = 'contributors'
2021

2122
def run(self):
22-
components = build_components(self.arguments[0])
23-
24-
message = nodes.paragraph()
25-
message += nodes.Text(components['author_message'])
26-
27-
listnode = nodes.bullet_list()
28-
29-
for author in components['authors']:
30-
para = nodes.paragraph()
31-
para += nodes.Text(author)
32-
listnode += nodes.list_item('', para)
23+
range_ = self.arguments[0]
24+
try:
25+
components = build_components(range_)
26+
except git.GitCommandError:
27+
return [
28+
self.state.document.reporter.warning(
29+
"Cannot find contributors for range '{}'".format(range_),
30+
line=self.lineno)
31+
]
32+
else:
33+
message = nodes.paragraph()
34+
message += nodes.Text(components['author_message'])
35+
36+
listnode = nodes.bullet_list()
37+
38+
for author in components['authors']:
39+
para = nodes.paragraph()
40+
para += nodes.Text(author)
41+
listnode += nodes.list_item('', para)
3342

3443
return [message, listnode]
3544

pandas/_libs/lib.pyx

+12-11
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,7 @@ cdef extern from "src/parse_helper.h":
4848
int floatify(object, float64_t *result, int *maybe_int) except -1
4949

5050
cimport util
51-
from util cimport (is_nan,
52-
UINT8_MAX, UINT64_MAX, INT64_MAX, INT64_MIN)
51+
from util cimport is_nan, UINT64_MAX, INT64_MAX, INT64_MIN
5352

5453
from tslib import array_to_datetime
5554
from tslibs.nattype cimport NPY_NAT
@@ -1642,20 +1641,22 @@ def is_datetime_with_singletz_array(values: ndarray) -> bool:
16421641

16431642
if n == 0:
16441643
return False
1645-
1644+
# Get a reference timezone to compare with the rest of the tzs in the array
16461645
for i in range(n):
16471646
base_val = values[i]
16481647
if base_val is not NaT:
16491648
base_tz = get_timezone(getattr(base_val, 'tzinfo', None))
1650-
1651-
for j in range(i, n):
1652-
val = values[j]
1653-
if val is not NaT:
1654-
tz = getattr(val, 'tzinfo', None)
1655-
if not tz_compare(base_tz, tz):
1656-
return False
16571649
break
16581650

1651+
for j in range(i, n):
1652+
# Compare val's timezone with the reference timezone
1653+
# NaT can coexist with tz-aware datetimes, so skip if encountered
1654+
val = values[j]
1655+
if val is not NaT:
1656+
tz = getattr(val, 'tzinfo', None)
1657+
if not tz_compare(base_tz, tz):
1658+
return False
1659+
16591660
return True
16601661

16611662

@@ -2045,7 +2046,7 @@ def maybe_convert_objects(ndarray[object] objects, bint try_float=0,
20452046

20462047
# we try to coerce datetime w/tz but must all have the same tz
20472048
if seen.datetimetz_:
2048-
if len({getattr(val, 'tzinfo', None) for val in objects}) == 1:
2049+
if is_datetime_with_singletz_array(objects):
20492050
from pandas import DatetimeIndex
20502051
return DatetimeIndex(objects)
20512052
seen.object_ = 1

pandas/core/arrays/categorical.py

-1
Original file line numberDiff line numberDiff line change
@@ -2435,7 +2435,6 @@ class CategoricalAccessor(PandasDelegate, PandasObject, NoNewAttributesMixin):
24352435
>>> s.cat.set_categories(list('abcde'))
24362436
>>> s.cat.as_ordered()
24372437
>>> s.cat.as_unordered()
2438-
24392438
"""
24402439

24412440
def __init__(self, data):

pandas/core/arrays/datetimes.py

-1
Original file line numberDiff line numberDiff line change
@@ -764,7 +764,6 @@ def tz_localize(self, tz, ambiguous='raise', nonexistent='raise',
764764
1 2018-10-28 02:36:00+02:00
765765
2 2018-10-28 03:46:00+01:00
766766
dtype: datetime64[ns, CET]
767-
768767
"""
769768
if errors is not None:
770769
warnings.warn("The errors argument is deprecated and will be "

pandas/core/frame.py

+37-31
Original file line numberDiff line numberDiff line change
@@ -2035,24 +2035,21 @@ def to_parquet(self, fname, engine='auto', compression='snappy',
20352035
def to_string(self, buf=None, columns=None, col_space=None, header=True,
20362036
index=True, na_rep='NaN', formatters=None, float_format=None,
20372037
sparsify=None, index_names=True, justify=None,
2038-
line_width=None, max_rows=None, max_cols=None,
2039-
show_dimensions=False):
2038+
max_rows=None, max_cols=None, show_dimensions=False,
2039+
decimal='.', line_width=None):
20402040
"""
20412041
Render a DataFrame to a console-friendly tabular output.
2042-
20432042
%(shared_params)s
20442043
line_width : int, optional
20452044
Width to wrap a line in characters.
2046-
20472045
%(returns)s
2048-
20492046
See Also
20502047
--------
20512048
to_html : Convert DataFrame to HTML.
20522049
20532050
Examples
20542051
--------
2055-
>>> d = {'col1' : [1, 2, 3], 'col2' : [4, 5, 6]}
2052+
>>> d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
20562053
>>> df = pd.DataFrame(d)
20572054
>>> print(df.to_string())
20582055
col1 col2
@@ -2068,42 +2065,37 @@ def to_string(self, buf=None, columns=None, col_space=None, header=True,
20682065
sparsify=sparsify, justify=justify,
20692066
index_names=index_names,
20702067
header=header, index=index,
2071-
line_width=line_width,
20722068
max_rows=max_rows,
20732069
max_cols=max_cols,
2074-
show_dimensions=show_dimensions)
2070+
show_dimensions=show_dimensions,
2071+
decimal=decimal,
2072+
line_width=line_width)
20752073
formatter.to_string()
20762074

20772075
if buf is None:
20782076
result = formatter.buf.getvalue()
20792077
return result
20802078

2081-
@Substitution(header='whether to print column labels, default True')
2079+
@Substitution(header='Whether to print column labels, default True')
20822080
@Substitution(shared_params=fmt.common_docstring,
20832081
returns=fmt.return_docstring)
20842082
def to_html(self, buf=None, columns=None, col_space=None, header=True,
20852083
index=True, na_rep='NaN', formatters=None, float_format=None,
2086-
sparsify=None, index_names=True, justify=None, bold_rows=True,
2087-
classes=None, escape=True, max_rows=None, max_cols=None,
2088-
show_dimensions=False, notebook=False, decimal='.',
2089-
border=None, table_id=None):
2084+
sparsify=None, index_names=True, justify=None, max_rows=None,
2085+
max_cols=None, show_dimensions=False, decimal='.',
2086+
bold_rows=True, classes=None, escape=True,
2087+
notebook=False, border=None, table_id=None):
20902088
"""
20912089
Render a DataFrame as an HTML table.
2092-
20932090
%(shared_params)s
2094-
bold_rows : boolean, default True
2095-
Make the row labels bold in the output
2091+
bold_rows : bool, default True
2092+
Make the row labels bold in the output.
20962093
classes : str or list or tuple, default None
2097-
CSS class(es) to apply to the resulting html table
2098-
escape : boolean, default True
2094+
CSS class(es) to apply to the resulting html table.
2095+
escape : bool, default True
20992096
Convert the characters <, >, and & to HTML-safe sequences.
21002097
notebook : {True, False}, default False
21012098
Whether the generated HTML is for IPython Notebook.
2102-
decimal : string, default '.'
2103-
Character recognized as decimal separator, e.g. ',' in Europe
2104-
2105-
.. versionadded:: 0.18.0
2106-
21072099
border : int
21082100
A ``border=border`` attribute is included in the opening
21092101
`<table>` tag. Default ``pd.options.html.border``.
@@ -2114,9 +2106,7 @@ def to_html(self, buf=None, columns=None, col_space=None, header=True,
21142106
A css id is included in the opening `<table>` tag if specified.
21152107
21162108
.. versionadded:: 0.23.0
2117-
21182109
%(returns)s
2119-
21202110
See Also
21212111
--------
21222112
to_string : Convert DataFrame to a string.
@@ -5213,8 +5203,10 @@ def combiner(x, y):
52135203

52145204
return self.combine(other, combiner, overwrite=False)
52155205

5206+
@deprecate_kwarg(old_arg_name='raise_conflict', new_arg_name='errors',
5207+
mapping={False: 'ignore', True: 'raise'})
52165208
def update(self, other, join='left', overwrite=True, filter_func=None,
5217-
raise_conflict=False):
5209+
errors='ignore'):
52185210
"""
52195211
Modify in place using non-NA values from another DataFrame.
52205212
@@ -5238,17 +5230,28 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
52385230
* False: only update values that are NA in
52395231
the original DataFrame.
52405232
5241-
filter_func : callable(1d-array) -> boolean 1d-array, optional
5233+
filter_func : callable(1d-array) -> bool 1d-array, optional
52425234
Can choose to replace values other than NA. Return True for values
52435235
that should be updated.
5244-
raise_conflict : bool, default False
5245-
If True, will raise a ValueError if the DataFrame and `other`
5236+
errors : {'raise', 'ignore'}, default 'ignore'
5237+
If 'raise', will raise a ValueError if the DataFrame and `other`
52465238
both contain non-NA data in the same place.
52475239
5240+
.. versionchanged :: 0.24.0
5241+
Changed from `raise_conflict=False|True`
5242+
to `errors='ignore'|'raise'`.
5243+
5244+
Returns
5245+
-------
5246+
None : method directly changes calling object
5247+
52485248
Raises
52495249
------
52505250
ValueError
5251-
When `raise_conflict` is True and there's overlapping non-NA data.
5251+
* When `errors='raise'` and there's overlapping non-NA data.
5252+
* When `errors` is not either `'ignore'` or `'raise'`
5253+
NotImplementedError
5254+
* If `join != 'left'`
52525255
52535256
See Also
52545257
--------
@@ -5319,6 +5322,9 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
53195322
# TODO: Support other joins
53205323
if join != 'left': # pragma: no cover
53215324
raise NotImplementedError("Only left join is supported")
5325+
if errors not in ['ignore', 'raise']:
5326+
raise ValueError("The parameter errors must be either "
5327+
"'ignore' or 'raise'")
53225328

53235329
if not isinstance(other, DataFrame):
53245330
other = DataFrame(other)
@@ -5332,7 +5338,7 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
53325338
with np.errstate(all='ignore'):
53335339
mask = ~filter_func(this) | isna(that)
53345340
else:
5335-
if raise_conflict:
5341+
if errors == 'raise':
53365342
mask_this = notna(that)
53375343
mask_that = notna(this)
53385344
if any(mask_this & mask_that):

0 commit comments

Comments
 (0)