Skip to content

Commit 9840a95

Browse files
committed
Merge changes from master
2 parents 1152d78 + 9d09493 commit 9840a95

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+1620
-355
lines changed

RELEASE.rst

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,71 @@ Where to get it
2222
* Binary installers on PyPI: http://pypi.python.org/pypi/pandas
2323
* Documentation: http://pandas.pydata.org
2424

25+
pandas 0.7.3
26+
============
27+
28+
**Release date:** April 12, 2012
29+
30+
**New features / modules**
31+
32+
- Added fixed-width file reader, read_fwf (PR #952)
33+
- Add group_keys argument to groupby to not add group names to MultiIndex in
34+
result of apply (GH #938)
35+
- DataFrame can now accept non-integer label slicing (GH #946). Previously
36+
only DataFrame.ix was able to do so.
37+
- DataFrame.apply now retains name attributes on Series objects (GH #983)
38+
- Numeric DataFrame comparisons with non-numeric values now raises proper
39+
TypeError (GH #943). Previously raise "PandasError: DataFrame constructor
40+
not properly called!"
41+
- Add ``kurt`` methods to Series and DataFrame (PR #964)
42+
- Can pass dict of column -> list/set NA values for text parsers (GH #754)
43+
- Allows users specified NA values in text parsers (GH #754)
44+
- Parsers checks for openpyxl dependency and raises ImportError if not found
45+
(PR #1007)
46+
- New factory function to create HDFStore objects that can be used in a with
47+
statement so users do not have to explicitly call HDFStore.close (PR #1005)
48+
- pivot_table is now more flexible with same parameters as groupby (GH #941)
49+
- Added stacked bar plots (GH #987)
50+
- scatter_matrix method in pandas/tools/plotting.py (PR #935)
51+
- DataFrame.boxplot returns plot results for ex-post styling (GH #985)
52+
- Short version number accessible as pandas.version.short_version (GH #930)
53+
- Additional documentation in panel.to_frame (GH #942)
54+
- More informative Series.apply docstring regarding element-wise apply
55+
(GH #977)
56+
- Notes on rpy2 installation (GH #1006)
57+
- Add rotation and font size options to hist method (#1012)
58+
- Use exogenous / X variable index in result of OLS.y_predict. Add
59+
OLS.predict method (PR #1027, #1008)
60+
61+
**API Changes**
62+
63+
- Calling apply on grouped Series, e.g. describe(), will no longer yield
64+
DataFrame by default. Will have to call unstack() to get prior behavior
65+
- NA handling in non-numeric comparisons has been tightened up (#933, #953)
66+
67+
**Bug fixes**
68+
69+
- Fix logic error when selecting part of a row in a DataFrame with a
70+
MultiIndex index (GH #1013)
71+
- Series comparison with Series of differing length causes crash (GH #1016).
72+
- Fix bug in indexing when selecting section of hierarchically-indexed row
73+
(GH #1013)
74+
- DataFrame.plot(logy=True) has no effect (GH #1011).
75+
- Broken arithmetic operations between SparsePanel-Panel (GH #1015)
76+
- Unicode repr issues in MultiIndex with non-ascii characters (GH #1010)
77+
- DataFrame.lookup() returns inconsistent results if exact match not present
78+
(GH #1001)
79+
- DataFrame arithmetic operations not treating None as NA (GH #992)
80+
- DataFrameGroupBy.apply returns incorrect result (GH #991)
81+
- Series.reshape returns incorrect result for multiple dimensions (GH #989)
82+
- Series.std and Series.var ignores ddof parameter (GH #934)
83+
- DataFrame.append loses index names (GH #980)
84+
- DataFrame.plot(kind='bar') ignores color argument (GH #958)
85+
- Inconsistent Index comparison results (GH #948)
86+
- Improper int dtype DataFrame construction from data with NaN (GH #846)
87+
- Removes default 'result' name in grouby results (GH #995)
88+
- DataFrame.from_records no longer mutate input columns (PR #975)
89+
2590
pandas 0.7.2
2691
============
2792

doc/source/dsintro.rst

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -687,7 +687,20 @@ For example, compare to the construction above:
687687
688688
Panel.from_dict(data, orient='minor')
689689
690-
Orient is especially useful for mixed-type DataFrames.
690+
Orient is especially useful for mixed-type DataFrames. If you pass a dict of
691+
DataFrame objects with mixed-type columns, all of the data will get upcasted to
692+
``dtype=object`` unless you pass ``orient='minor'``:
693+
694+
.. ipython:: python
695+
696+
df = DataFrame({'a': ['foo', 'bar', 'baz'],
697+
'b': np.random.randn(3)})
698+
df
699+
data = {'item1': df, 'item2': df}
700+
panel = Panel.from_dict(data, orient='minor')
701+
panel['a']
702+
panel['b']
703+
panel['b'].dtypes
691704
692705
.. note::
693706

@@ -747,3 +760,18 @@ For example, using the earlier example data, we could do:
747760
wp.major_xs(wp.major_axis[2])
748761
wp.minor_axis
749762
wp.minor_xs('C')
763+
764+
Conversion to DataFrame
765+
~~~~~~~~~~~~~~~~~~~~~~~
766+
767+
A Panel can be represented in 2D form as a hierarchically indexed
768+
DataFrame. See the section :ref:`hierarchical indexing <indexing.hierarchical>`
769+
for more on this. To convert a Panel to a DataFrame, use the ``to_frame``
770+
method:
771+
772+
.. ipython:: python
773+
774+
panel = Panel(np.random.randn(3, 5, 4), items=['one', 'two', 'three'],
775+
major_axis=DateRange('1/1/2000', periods=5),
776+
minor_axis=['a', 'b', 'c', 'd'])
777+
panel.to_frame()

doc/source/io.rst

Lines changed: 63 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ data into a DataFrame object. They can take a number of arguments:
9494
- ``converters``: a dictionary of functions for converting values in certain
9595
columns, where keys are either integers or column labels
9696
- ``encoding``: a string representing the encoding to use if the contents are
97-
non-ascii, for python versions prior to 3
97+
non-ascii
9898
- ``verbose`` : show number of NA values inserted in non-numeric columns
9999

100100
.. ipython:: python
@@ -139,6 +139,67 @@ fragile. Type inference is a pretty big deal. So if a column can be coerced to
139139
integer dtype without altering the contents, it will do so. Any non-numeric
140140
columns will come through as object dtype as with the rest of pandas objects.
141141

142+
.. _io.fwf:
143+
144+
Files with Fixed Width Columns
145+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
146+
While `read_csv` reads delimited data, the :func:`~pandas.io.parsers.read_fwf`
147+
function works with data files that have known and fixed column widths.
148+
The function parameters to `read_fwf` are largely the same as `read_csv` with
149+
two extra parameters:
150+
151+
- ``colspecs``: a list of pairs (tuples), giving the extents of the
152+
fixed-width fields of each line as half-open intervals [from, to[
153+
- ``widths``: a list of field widths, which can be used instead of
154+
``colspecs`` if the intervals are contiguous
155+
156+
.. ipython:: python
157+
:suppress:
158+
159+
f = open('bar.csv', 'w')
160+
data1 = ("id8141 360.242940 149.910199 11950.7\n"
161+
"id1594 444.953632 166.985655 11788.4\n"
162+
"id1849 364.136849 183.628767 11806.2\n"
163+
"id1230 413.836124 184.375703 11916.8\n"
164+
"id1948 502.953953 173.237159 12468.3")
165+
f.write(data1)
166+
f.close()
167+
168+
Consider a typical fixed-width data file:
169+
170+
.. ipython:: python
171+
172+
print open('bar.csv').read()
173+
174+
In order to parse this file into a DataFrame, we simply need to supply the
175+
column specifications to the `read_fwf` function along with the file name:
176+
177+
.. ipython:: python
178+
179+
#Column specifications are a list of half-intervals
180+
colspecs = [(0, 6), (8, 20), (21, 33), (34, 43)]
181+
df = read_fwf('bar.csv', colspecs=colspecs, header=None, index_col=0)
182+
df
183+
184+
Note how the parser automatically picks column names X.<column number> when
185+
``header=None`` argument is specified. Alternatively, you can supply just the
186+
column widths for contiguous columns:
187+
188+
.. ipython:: python
189+
190+
#Widths are a list of integers
191+
widths = [6, 14, 13, 10]
192+
df = read_fwf('bar.csv', widths=widths, header=None)
193+
df
194+
195+
The parser will take care of extra white spaces around the columns
196+
so it's ok to have extra separation between the columns in the file.
197+
198+
.. ipython:: python
199+
:suppress:
200+
201+
os.remove('bar.csv')
202+
142203
Files with an "implicit" index column
143204
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
144205

@@ -281,7 +342,7 @@ function takes a number of arguments. Only the first is required.
281342
- ``mode`` : Python write mode, default 'w'
282343
- ``sep`` : Field delimiter for the output file (default "'")
283344
- ``encoding``: a string representing the encoding to use if the contents are
284-
non-ascii, for python versions prior to 3
345+
non-ascii, for python versions prior to 3
285346

286347
Writing a formatted string
287348
~~~~~~~~~~~~~~~~~~~~~~~~~~

doc/source/r_interface.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,10 @@ rpy2 / R interface
1515
If your computer has R and rpy2 (> 2.2) installed (which will be left to the
1616
reader), you will be able to leverage the below functionality. On Windows,
1717
doing this is quite an ordeal at the moment, but users on Unix-like systems
18-
should find it quite easy. As a general rule, I would recommend using the
19-
latest revision of rpy2 from bitbucket:
18+
should find it quite easy. rpy2 evolves in time and the current interface is
19+
designed for the 2.2.x series, and we recommend to use over other series
20+
unless you are prepared to fix parts of the code. Released packages are available
21+
in PyPi, but should the latest code in the 2.2.x series be wanted it can be obtained with:
2022

2123
::
2224

@@ -25,7 +27,7 @@ latest revision of rpy2 from bitbucket:
2527

2628
cd rpy2
2729
hg pull
28-
hg update
30+
hg update version_2.2.x
2931
sudo python setup.py install
3032

3133
.. note::

doc/source/visualization.rst

Lines changed: 62 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -112,8 +112,8 @@ Other plotting features
112112

113113
.. _visualization.barplot:
114114

115-
Plotting non-time series data
116-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
115+
Bar plots
116+
~~~~~~~~~
117117

118118
For labeled, non-time series data, you may wish to produce a bar plot:
119119

@@ -124,8 +124,47 @@ For labeled, non-time series data, you may wish to produce a bar plot:
124124
@savefig bar_plot_ex.png width=4.5in
125125
df.ix[5].plot(kind='bar'); plt.axhline(0, color='k')
126126
127-
Histogramming
128-
~~~~~~~~~~~~~
127+
Calling a DataFrame's ``plot`` method with ``kind='bar'`` produces a multiple
128+
bar plot:
129+
130+
.. ipython:: python
131+
:suppress:
132+
133+
plt.figure();
134+
135+
.. ipython:: python
136+
137+
df2 = DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
138+
139+
@savefig bar_plot_multi_ex.png width=5in
140+
df2.plot(kind='bar');
141+
142+
To produce a stacked bar plot, pass ``stacked=True``:
143+
144+
.. ipython:: python
145+
:suppress:
146+
147+
plt.figure();
148+
149+
.. ipython:: python
150+
151+
@savefig bar_plot_stacked_ex.png width=5in
152+
df2.plot(kind='bar', stacked=True);
153+
154+
To get horizontal bar plots, pass ``kind='barh'``:
155+
156+
.. ipython:: python
157+
:suppress:
158+
159+
plt.figure();
160+
161+
.. ipython:: python
162+
163+
@savefig barh_plot_stacked_ex.png width=5in
164+
df2.plot(kind='barh', stacked=True);
165+
166+
Histograms
167+
~~~~~~~~~~
129168
.. ipython:: python
130169
131170
plt.figure();
@@ -160,7 +199,7 @@ a uniform random variable on [0,1).
160199
plt.figure();
161200
162201
@savefig box_plot_ex.png width=4.5in
163-
df.boxplot()
202+
bp = df.boxplot()
164203
165204
You can create a stratified boxplot using the ``by`` keyword argument to create
166205
groupings. For instance,
@@ -173,7 +212,7 @@ groupings. For instance,
173212
plt.figure();
174213
175214
@savefig box_plot_ex2.png width=4.5in
176-
df.boxplot(by='X')
215+
bp = df.boxplot(by='X')
177216
178217
You can also pass a subset of columns to plot, as well as group by multiple
179218
columns:
@@ -187,4 +226,20 @@ columns:
187226
plt.figure();
188227
189228
@savefig box_plot_ex3.png width=4.5in
190-
df.boxplot(column=['Col1','Col2'], by=['X','Y'])
229+
bp = df.boxplot(column=['Col1','Col2'], by=['X','Y'])
230+
231+
.. _visualization.scatter_matrix:
232+
233+
Scatter plot matrix
234+
~~~~~~~~~~~~~~~~~~~
235+
236+
*New in 0.7.3.* You can create a scatter plot matrix using the
237+
``scatter_matrix`` method in ``pandas.tools.plotting``:
238+
239+
.. ipython:: python
240+
241+
from pandas.tools.plotting import scatter_matrix
242+
df = DataFrame(np.random.randn(1000, 4), columns=['a', 'b', 'c', 'd'])
243+
244+
@savefig scatter_matrix_ex.png width=6in
245+
scatter_matrix(df, alpha=0.2, figsize=(8, 8))

doc/source/whatsnew.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ What's New
1616

1717
These are new features and improvements of note in each release.
1818

19+
.. include:: whatsnew/v0.7.3.txt
20+
1921
.. include:: whatsnew/v0.7.2.txt
2022

2123
.. include:: whatsnew/v0.7.1.txt

0 commit comments

Comments
 (0)