@@ -94,7 +94,7 @@ data into a DataFrame object. They can take a number of arguments:
9494 - ``converters ``: a dictionary of functions for converting values in certain
9595 columns, where keys are either integers or column labels
9696 - ``encoding ``: a string representing the encoding to use if the contents are
97- non-ascii, for python versions prior to 3
97+ non-ascii
9898 - ``verbose `` : show number of NA values inserted in non-numeric columns
9999
100100.. ipython :: python
@@ -139,6 +139,67 @@ fragile. Type inference is a pretty big deal. So if a column can be coerced to
139139integer dtype without altering the contents, it will do so. Any non-numeric
140140columns will come through as object dtype as with the rest of pandas objects.
141141
142+ .. _io.fwf :
143+
144+ Files with Fixed Width Columns
145+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
146+ While `read_csv ` reads delimited data, the :func: `~pandas.io.parsers.read_fwf `
147+ function works with data files that have known and fixed column widths.
148+ The function parameters to `read_fwf ` are largely the same as `read_csv ` with
149+ two extra parameters:
150+
151+ - ``colspecs ``: a list of pairs (tuples), giving the extents of the
152+ fixed-width fields of each line as half-open intervals [from, to[
153+ - ``widths ``: a list of field widths, which can be used instead of
154+ ``colspecs `` if the intervals are contiguous
155+
156+ .. ipython :: python
157+ :suppress:
158+
159+ f = open (' bar.csv' , ' w' )
160+ data1 = (" id8141 360.242940 149.910199 11950.7\n "
161+ " id1594 444.953632 166.985655 11788.4\n "
162+ " id1849 364.136849 183.628767 11806.2\n "
163+ " id1230 413.836124 184.375703 11916.8\n "
164+ " id1948 502.953953 173.237159 12468.3" )
165+ f.write(data1)
166+ f.close()
167+
168+ Consider a typical fixed-width data file:
169+
170+ .. ipython :: python
171+
172+ print open (' bar.csv' ).read()
173+
174+ In order to parse this file into a DataFrame, we simply need to supply the
175+ column specifications to the `read_fwf ` function along with the file name:
176+
177+ .. ipython :: python
178+
179+ # Column specifications are a list of half-intervals
180+ colspecs = [(0 , 6 ), (8 , 20 ), (21 , 33 ), (34 , 43 )]
181+ df = read_fwf(' bar.csv' , colspecs = colspecs, header = None , index_col = 0 )
182+ df
183+
184+ Note how the parser automatically picks column names X.<column number> when
185+ ``header=None `` argument is specified. Alternatively, you can supply just the
186+ column widths for contiguous columns:
187+
188+ .. ipython :: python
189+
190+ # Widths are a list of integers
191+ widths = [6 , 14 , 13 , 10 ]
192+ df = read_fwf(' bar.csv' , widths = widths, header = None )
193+ df
194+
195+ The parser will take care of extra white spaces around the columns
196+ so it's ok to have extra separation between the columns in the file.
197+
198+ .. ipython :: python
199+ :suppress:
200+
201+ os.remove(' bar.csv' )
202+
142203 Files with an "implicit" index column
143204~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
144205
@@ -281,7 +342,7 @@ function takes a number of arguments. Only the first is required.
281342 - ``mode `` : Python write mode, default 'w'
282343 - ``sep `` : Field delimiter for the output file (default "'")
283344 - ``encoding ``: a string representing the encoding to use if the contents are
284- non-ascii, for python versions prior to 3
345+ non-ascii, for python versions prior to 3
285346
286347Writing a formatted string
287348~~~~~~~~~~~~~~~~~~~~~~~~~~
0 commit comments