@@ -94,7 +94,7 @@ data into a DataFrame object. They can take a number of arguments:
94
94
- ``converters ``: a dictionary of functions for converting values in certain
95
95
columns, where keys are either integers or column labels
96
96
- ``encoding ``: a string representing the encoding to use if the contents are
97
- non-ascii, for python versions prior to 3
97
+ non-ascii
98
98
- ``verbose `` : show number of NA values inserted in non-numeric columns
99
99
100
100
.. ipython :: python
@@ -139,6 +139,67 @@ fragile. Type inference is a pretty big deal. So if a column can be coerced to
139
139
integer dtype without altering the contents, it will do so. Any non-numeric
140
140
columns will come through as object dtype as with the rest of pandas objects.
141
141
142
+ .. _io.fwf :
143
+
144
+ Files with Fixed Width Columns
145
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
146
+ While `read_csv ` reads delimited data, the :func: `~pandas.io.parsers.read_fwf `
147
+ function works with data files that have known and fixed column widths.
148
+ The function parameters to `read_fwf ` are largely the same as `read_csv ` with
149
+ two extra parameters:
150
+
151
+ - ``colspecs ``: a list of pairs (tuples), giving the extents of the
152
+ fixed-width fields of each line as half-open intervals [from, to[
153
+ - ``widths ``: a list of field widths, which can be used instead of
154
+ ``colspecs `` if the intervals are contiguous
155
+
156
+ .. ipython :: python
157
+ :suppress:
158
+
159
+ f = open (' bar.csv' , ' w' )
160
+ data1 = (" id8141 360.242940 149.910199 11950.7\n "
161
+ " id1594 444.953632 166.985655 11788.4\n "
162
+ " id1849 364.136849 183.628767 11806.2\n "
163
+ " id1230 413.836124 184.375703 11916.8\n "
164
+ " id1948 502.953953 173.237159 12468.3" )
165
+ f.write(data1)
166
+ f.close()
167
+
168
+ Consider a typical fixed-width data file:
169
+
170
+ .. ipython :: python
171
+
172
+ print open (' bar.csv' ).read()
173
+
174
+ In order to parse this file into a DataFrame, we simply need to supply the
175
+ column specifications to the `read_fwf ` function along with the file name:
176
+
177
+ .. ipython :: python
178
+
179
+ # Column specifications are a list of half-intervals
180
+ colspecs = [(0 , 6 ), (8 , 20 ), (21 , 33 ), (34 , 43 )]
181
+ df = read_fwf(' bar.csv' , colspecs = colspecs, header = None , index_col = 0 )
182
+ df
183
+
184
+ Note how the parser automatically picks column names X.<column number> when
185
+ ``header=None `` argument is specified. Alternatively, you can supply just the
186
+ column widths for contiguous columns:
187
+
188
+ .. ipython :: python
189
+
190
+ # Widths are a list of integers
191
+ widths = [6 , 14 , 13 , 10 ]
192
+ df = read_fwf(' bar.csv' , widths = widths, header = None )
193
+ df
194
+
195
+ The parser will take care of extra white spaces around the columns
196
+ so it's ok to have extra separation between the columns in the file.
197
+
198
+ .. ipython :: python
199
+ :suppress:
200
+
201
+ os.remove(' bar.csv' )
202
+
142
203
Files with an "implicit" index column
143
204
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
144
205
@@ -281,7 +342,7 @@ function takes a number of arguments. Only the first is required.
281
342
- ``mode `` : Python write mode, default 'w'
282
343
- ``sep `` : Field delimiter for the output file (default "'")
283
344
- ``encoding ``: a string representing the encoding to use if the contents are
284
- non-ascii, for python versions prior to 3
345
+ non-ascii, for python versions prior to 3
285
346
286
347
Writing a formatted string
287
348
~~~~~~~~~~~~~~~~~~~~~~~~~~
0 commit comments