Skip to content

Commit 88b32eb

Browse files
bpo-28749: Fixed the documentation of the mapping codec APIs. (#487) (#715)
Added the documentation for PyUnicode_Translate(). (cherry picked from commit c85a266)
1 parent 3091636 commit 88b32eb

File tree

2 files changed

+66
-74
lines changed

2 files changed

+66
-74
lines changed

Doc/c-api/unicode.rst

+48-47
Original file line numberDiff line numberDiff line change
@@ -1388,77 +1388,78 @@ Character Map Codecs
13881388
This codec is special in that it can be used to implement many different codecs
13891389
(and this is in fact what was done to obtain most of the standard codecs
13901390
included in the :mod:`encodings` package). The codec uses mapping to encode and
1391-
decode characters.
1392-
1393-
Decoding mappings must map single string characters to single Unicode
1394-
characters, integers (which are then interpreted as Unicode ordinals) or ``None``
1395-
(meaning "undefined mapping" and causing an error).
1396-
1397-
Encoding mappings must map single Unicode characters to single string
1398-
characters, integers (which are then interpreted as Latin-1 ordinals) or ``None``
1399-
(meaning "undefined mapping" and causing an error).
1400-
1401-
The mapping objects provided must only support the __getitem__ mapping
1402-
interface.
1403-
1404-
If a character lookup fails with a LookupError, the character is copied as-is
1405-
meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
1406-
resp. Because of this, mappings only need to contain those mappings which map
1407-
characters to different code points.
1391+
decode characters. The mapping objects provided must support the
1392+
:meth:`__getitem__` mapping interface; dictionaries and sequences work well.
14081393
14091394
These are the mapping codec APIs:
14101395
1411-
.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, \
1396+
.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *data, Py_ssize_t size, \
14121397
PyObject *mapping, const char *errors)
14131398
1414-
Create a Unicode object by decoding *size* bytes of the encoded string *s* using
1415-
the given *mapping* object. Return *NULL* if an exception was raised by the
1416-
codec. If *mapping* is *NULL* latin-1 decoding will be done. Else it can be a
1417-
dictionary mapping byte or a unicode string, which is treated as a lookup table.
1418-
Byte values greater that the length of the string and U+FFFE "characters" are
1419-
treated as "undefined mapping".
1399+
Create a Unicode object by decoding *size* bytes of the encoded string *s*
1400+
using the given *mapping* object. Return *NULL* if an exception was raised
1401+
by the codec.
1402+
1403+
If *mapping* is *NULL*, Latin-1 decoding will be applied. Else
1404+
*mapping* must map bytes ordinals (integers in the range from 0 to 255)
1405+
to Unicode strings, integers (which are then interpreted as Unicode
1406+
ordinals) or ``None``. Unmapped data bytes -- ones which cause a
1407+
:exc:`LookupError`, as well as ones which get mapped to ``None``,
1408+
``0xFFFE`` or ``'\ufffe'``, are treated as undefined mappings and cause
1409+
an error.
14201410
14211411
14221412
.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
14231413
1424-
Encode a Unicode object using the given *mapping* object and return the result
1425-
as Python string object. Error handling is "strict". Return *NULL* if an
1414+
Encode a Unicode object using the given *mapping* object and return the
1415+
result as a bytes object. Error handling is "strict". Return *NULL* if an
14261416
exception was raised by the codec.
14271417
1428-
The following codec API is special in that maps Unicode to Unicode.
1429-
1418+
The *mapping* object must map Unicode ordinal integers to bytes objects,
1419+
integers in the range from 0 to 255 or ``None``. Unmapped character
1420+
ordinals (ones which cause a :exc:`LookupError`) as well as mapped to
1421+
``None`` are treated as "undefined mapping" and cause an error.
14301422
1431-
.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1432-
PyObject *table, const char *errors)
1433-
1434-
Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
1435-
character mapping *table* to it and return the resulting Unicode object. Return
1436-
*NULL* when an exception was raised by the codec.
14371423
1438-
The *mapping* table must map Unicode ordinal integers to Unicode ordinal
1439-
integers or ``None`` (causing deletion of the character).
1424+
.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1425+
PyObject *mapping, const char *errors)
14401426
1441-
Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
1442-
and sequences work well. Unmapped character ordinals (ones which cause a
1443-
:exc:`LookupError`) are left untouched and are copied as-is.
1427+
Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
1428+
*mapping* object and return the result as a bytes object. Return *NULL* if
1429+
an exception was raised by the codec.
14441430
14451431
.. deprecated-removed:: 3.3 4.0
14461432
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1447-
:c:func:`PyUnicode_Translate`. or :ref:`generic codec based API
1448-
<codec-registry>`
1433+
:c:func:`PyUnicode_AsCharmapString` or
1434+
:c:func:`PyUnicode_AsEncodedString`.
14491435
14501436
1451-
.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1437+
The following codec API is special in that maps Unicode to Unicode.
1438+
1439+
.. c:function:: PyObject* PyUnicode_Translate(PyObject *unicode, \
14521440
PyObject *mapping, const char *errors)
14531441
1454-
Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
1455-
*mapping* object and return a Python string object. Return *NULL* if an
1456-
exception was raised by the codec.
1442+
Translate a Unicode object using the given *mapping* object and return the
1443+
resulting Unicode object. Return *NULL* if an exception was raised by the
1444+
codec.
1445+
1446+
The *mapping* object must map Unicode ordinal integers to Unicode strings,
1447+
integers (which are then interpreted as Unicode ordinals) or ``None``
1448+
(causing deletion of the character). Unmapped character ordinals (ones
1449+
which cause a :exc:`LookupError`) are left untouched and are copied as-is.
1450+
1451+
1452+
.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1453+
PyObject *mapping, const char *errors)
1454+
1455+
Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
1456+
character *mapping* table to it and return the resulting Unicode object.
1457+
Return *NULL* when an exception was raised by the codec.
14571458
14581459
.. deprecated-removed:: 3.3 4.0
14591460
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1460-
:c:func:`PyUnicode_AsCharmapString` or
1461-
:c:func:`PyUnicode_AsEncodedString`.
1461+
:c:func:`PyUnicode_Translate`. or :ref:`generic codec based API
1462+
<codec-registry>`
14621463
14631464
14641465
MBCS codecs for Windows

Include/unicodeobject.h

+18-27
Original file line numberDiff line numberDiff line change
@@ -1570,50 +1570,41 @@ PyAPI_FUNC(PyObject*) PyUnicode_EncodeASCII(
15701570
15711571
This codec uses mappings to encode and decode characters.
15721572
1573-
Decoding mappings must map single string characters to single
1574-
Unicode characters, integers (which are then interpreted as Unicode
1575-
ordinals) or None (meaning "undefined mapping" and causing an
1576-
error).
1577-
1578-
Encoding mappings must map single Unicode characters to single
1579-
string characters, integers (which are then interpreted as Latin-1
1580-
ordinals) or None (meaning "undefined mapping" and causing an
1581-
error).
1582-
1583-
If a character lookup fails with a LookupError, the character is
1584-
copied as-is meaning that its ordinal value will be interpreted as
1585-
Unicode or Latin-1 ordinal resp. Because of this mappings only need
1586-
to contain those mappings which map characters to different code
1587-
points.
1573+
Decoding mappings must map byte ordinals (integers in the range from 0 to
1574+
255) to Unicode strings, integers (which are then interpreted as Unicode
1575+
ordinals) or None. Unmapped data bytes (ones which cause a LookupError)
1576+
as well as mapped to None, 0xFFFE or '\ufffe' are treated as "undefined
1577+
mapping" and cause an error.
1578+
1579+
Encoding mappings must map Unicode ordinal integers to bytes objects,
1580+
integers in the range from 0 to 255 or None. Unmapped character
1581+
ordinals (ones which cause a LookupError) as well as mapped to
1582+
None are treated as "undefined mapping" and cause an error.
15881583
15891584
*/
15901585

15911586
PyAPI_FUNC(PyObject*) PyUnicode_DecodeCharmap(
15921587
const char *string, /* Encoded string */
15931588
Py_ssize_t length, /* size of string */
1594-
PyObject *mapping, /* character mapping
1595-
(char ordinal -> unicode ordinal) */
1589+
PyObject *mapping, /* decoding mapping */
15961590
const char *errors /* error handling */
15971591
);
15981592

15991593
PyAPI_FUNC(PyObject*) PyUnicode_AsCharmapString(
16001594
PyObject *unicode, /* Unicode object */
1601-
PyObject *mapping /* character mapping
1602-
(unicode ordinal -> char ordinal) */
1595+
PyObject *mapping /* encoding mapping */
16031596
);
16041597

16051598
#ifndef Py_LIMITED_API
16061599
PyAPI_FUNC(PyObject*) PyUnicode_EncodeCharmap(
16071600
const Py_UNICODE *data, /* Unicode char buffer */
16081601
Py_ssize_t length, /* Number of Py_UNICODE chars to encode */
1609-
PyObject *mapping, /* character mapping
1610-
(unicode ordinal -> char ordinal) */
1602+
PyObject *mapping, /* encoding mapping */
16111603
const char *errors /* error handling */
16121604
);
16131605
PyAPI_FUNC(PyObject*) _PyUnicode_EncodeCharmap(
16141606
PyObject *unicode, /* Unicode object */
1615-
PyObject *mapping, /* character mapping
1616-
(unicode ordinal -> char ordinal) */
1607+
PyObject *mapping, /* encoding mapping */
16171608
const char *errors /* error handling */
16181609
);
16191610
#endif
@@ -1622,8 +1613,8 @@ PyAPI_FUNC(PyObject*) _PyUnicode_EncodeCharmap(
16221613
character mapping table to it and return the resulting Unicode
16231614
object.
16241615
1625-
The mapping table must map Unicode ordinal integers to Unicode
1626-
ordinal integers or None (causing deletion of the character).
1616+
The mapping table must map Unicode ordinal integers to Unicode strings,
1617+
Unicode ordinal integers or None (causing deletion of the character).
16271618
16281619
Mapping tables may be dictionaries or sequences. Unmapped character
16291620
ordinals (ones which cause a LookupError) are left untouched and
@@ -1915,8 +1906,8 @@ PyAPI_FUNC(PyObject*) PyUnicode_RSplit(
19151906
/* Translate a string by applying a character mapping table to it and
19161907
return the resulting Unicode object.
19171908
1918-
The mapping table must map Unicode ordinal integers to Unicode
1919-
ordinal integers or None (causing deletion of the character).
1909+
The mapping table must map Unicode ordinal integers to Unicode strings,
1910+
Unicode ordinal integers or None (causing deletion of the character).
19201911
19211912
Mapping tables may be dictionaries or sequences. Unmapped character
19221913
ordinals (ones which cause a LookupError) are left untouched and

0 commit comments

Comments
 (0)