@@ -1029,6 +1029,10 @@ Connection objects
1029
1029
f.write('%s\n ' % line)
1030
1030
con.close()
1031
1031
1032
+ .. seealso ::
1033
+
1034
+ :ref: `sqlite3-howto-encoding `
1035
+
1032
1036
1033
1037
.. method :: backup(target, *, pages=-1, progress=None, name="main", sleep=0.250)
1034
1038
@@ -1095,6 +1099,10 @@ Connection objects
1095
1099
1096
1100
.. versionadded :: 3.7
1097
1101
1102
+ .. seealso ::
1103
+
1104
+ :ref: `sqlite3-howto-encoding `
1105
+
1098
1106
.. method :: getlimit(category, /)
1099
1107
1100
1108
Get a connection runtime limit.
@@ -1253,39 +1261,8 @@ Connection objects
1253
1261
and returns a text representation of it.
1254
1262
The callable is invoked for SQLite values with the ``TEXT `` data type.
1255
1263
By default, this attribute is set to :class: `str `.
1256
- If you want to return ``bytes `` instead, set *text_factory * to ``bytes ``.
1257
1264
1258
- Example:
1259
-
1260
- .. testcode ::
1261
-
1262
- con = sqlite3.connect(":memory: ")
1263
- cur = con.cursor()
1264
-
1265
- AUSTRIA = "Österreich"
1266
-
1267
- # by default, rows are returned as str
1268
- cur.execute("SELECT ?", (AUSTRIA,))
1269
- row = cur.fetchone()
1270
- assert row[0] == AUSTRIA
1271
-
1272
- # but we can make sqlite3 always return bytestrings ...
1273
- con.text_factory = bytes
1274
- cur.execute("SELECT ?", (AUSTRIA,))
1275
- row = cur.fetchone()
1276
- assert type(row[0]) is bytes
1277
- # the bytestrings will be encoded in UTF-8, unless you stored garbage in the
1278
- # database ...
1279
- assert row[0] == AUSTRIA.encode("utf-8")
1280
-
1281
- # we can also implement a custom text_factory ...
1282
- # here we implement one that appends "foo" to all strings
1283
- con.text_factory = lambda x: x.decode("utf-8") + "foo"
1284
- cur.execute("SELECT ?", ("bar",))
1285
- row = cur.fetchone()
1286
- assert row[0] == "barfoo"
1287
-
1288
- con.close()
1265
+ See :ref: `sqlite3-howto-encoding ` for more details.
1289
1266
1290
1267
.. attribute :: total_changes
1291
1268
@@ -1423,7 +1400,6 @@ Cursor objects
1423
1400
COMMIT;
1424
1401
""")
1425
1402
1426
-
1427
1403
.. method :: fetchone()
1428
1404
1429
1405
If :attr: `~Cursor.row_factory ` is ``None ``,
@@ -2369,6 +2345,47 @@ With some adjustments, the above recipe can be adapted to use a
2369
2345
instead of a :class: `~collections.namedtuple `.
2370
2346
2371
2347
2348
+ .. _sqlite3-howto-encoding :
2349
+
2350
+ How to handle non-UTF-8 text encodings
2351
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2352
+
2353
+ By default, :mod: `!sqlite3 ` uses :class: `str ` to adapt SQLite values
2354
+ with the ``TEXT `` data type.
2355
+ This works well for UTF-8 encoded text, but it might fail for other encodings
2356
+ and invalid UTF-8.
2357
+ You can use a custom :attr: `~Connection.text_factory ` to handle such cases.
2358
+
2359
+ Because of SQLite's `flexible typing `_, it is not uncommon to encounter table
2360
+ columns with the ``TEXT `` data type containing non-UTF-8 encodings,
2361
+ or even arbitrary data.
2362
+ To demonstrate, let's assume we have a database with ISO-8859-2 (Latin-2)
2363
+ encoded text, for example a table of Czech-English dictionary entries.
2364
+ Assuming we now have a :class: `Connection ` instance :py:data: `!con `
2365
+ connected to this database,
2366
+ we can decode the Latin-2 encoded text using this :attr: `~Connection.text_factory `:
2367
+
2368
+ .. testcode ::
2369
+
2370
+ con.text_factory = lambda data: str(data, encoding="latin2")
2371
+
2372
+ For invalid UTF-8 or arbitrary data in stored in ``TEXT `` table columns,
2373
+ you can use the following technique, borrowed from the :ref: `unicode-howto `:
2374
+
2375
+ .. testcode ::
2376
+
2377
+ con.text_factory = lambda data: str(data, errors="surrogateescape")
2378
+
2379
+ .. note ::
2380
+
2381
+ The :mod: `!sqlite3 ` module API does not support strings
2382
+ containing surrogates.
2383
+
2384
+ .. seealso ::
2385
+
2386
+ :ref: `unicode-howto `
2387
+
2388
+
2372
2389
.. _sqlite3-explanation :
2373
2390
2374
2391
Explanation
0 commit comments