Skip to content

Commit 69eab31

Browse files
bpo-28749: Fixed the documentation of the mapping codec APIs. (#487) (#714)
Added the documentation for PyUnicode_Translate(). (cherry picked from commit c85a266)
1 parent b044120 commit 69eab31

File tree

2 files changed

+66
-74
lines changed

2 files changed

+66
-74
lines changed

Doc/c-api/unicode.rst

+48-47
Original file line numberDiff line numberDiff line change
@@ -1393,77 +1393,78 @@ Character Map Codecs
13931393
This codec is special in that it can be used to implement many different codecs
13941394
(and this is in fact what was done to obtain most of the standard codecs
13951395
included in the :mod:`encodings` package). The codec uses mapping to encode and
1396-
decode characters.
1397-
1398-
Decoding mappings must map single string characters to single Unicode
1399-
characters, integers (which are then interpreted as Unicode ordinals) or ``None``
1400-
(meaning "undefined mapping" and causing an error).
1401-
1402-
Encoding mappings must map single Unicode characters to single string
1403-
characters, integers (which are then interpreted as Latin-1 ordinals) or ``None``
1404-
(meaning "undefined mapping" and causing an error).
1405-
1406-
The mapping objects provided must only support the __getitem__ mapping
1407-
interface.
1408-
1409-
If a character lookup fails with a LookupError, the character is copied as-is
1410-
meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
1411-
resp. Because of this, mappings only need to contain those mappings which map
1412-
characters to different code points.
1396+
decode characters. The mapping objects provided must support the
1397+
:meth:`__getitem__` mapping interface; dictionaries and sequences work well.
14131398
14141399
These are the mapping codec APIs:
14151400
1416-
.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, \
1401+
.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *data, Py_ssize_t size, \
14171402
PyObject *mapping, const char *errors)
14181403
1419-
Create a Unicode object by decoding *size* bytes of the encoded string *s* using
1420-
the given *mapping* object. Return *NULL* if an exception was raised by the
1421-
codec. If *mapping* is *NULL* latin-1 decoding will be done. Else it can be a
1422-
dictionary mapping byte or a unicode string, which is treated as a lookup table.
1423-
Byte values greater that the length of the string and U+FFFE "characters" are
1424-
treated as "undefined mapping".
1404+
Create a Unicode object by decoding *size* bytes of the encoded string *s*
1405+
using the given *mapping* object. Return *NULL* if an exception was raised
1406+
by the codec.
1407+
1408+
If *mapping* is *NULL*, Latin-1 decoding will be applied. Else
1409+
*mapping* must map bytes ordinals (integers in the range from 0 to 255)
1410+
to Unicode strings, integers (which are then interpreted as Unicode
1411+
ordinals) or ``None``. Unmapped data bytes -- ones which cause a
1412+
:exc:`LookupError`, as well as ones which get mapped to ``None``,
1413+
``0xFFFE`` or ``'\ufffe'``, are treated as undefined mappings and cause
1414+
an error.
14251415
14261416
14271417
.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
14281418
1429-
Encode a Unicode object using the given *mapping* object and return the result
1430-
as Python string object. Error handling is "strict". Return *NULL* if an
1419+
Encode a Unicode object using the given *mapping* object and return the
1420+
result as a bytes object. Error handling is "strict". Return *NULL* if an
14311421
exception was raised by the codec.
14321422
1433-
The following codec API is special in that maps Unicode to Unicode.
1434-
1423+
The *mapping* object must map Unicode ordinal integers to bytes objects,
1424+
integers in the range from 0 to 255 or ``None``. Unmapped character
1425+
ordinals (ones which cause a :exc:`LookupError`) as well as mapped to
1426+
``None`` are treated as "undefined mapping" and cause an error.
14351427
1436-
.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1437-
PyObject *table, const char *errors)
1438-
1439-
Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
1440-
character mapping *table* to it and return the resulting Unicode object. Return
1441-
*NULL* when an exception was raised by the codec.
14421428
1443-
The *mapping* table must map Unicode ordinal integers to Unicode ordinal
1444-
integers or ``None`` (causing deletion of the character).
1429+
.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1430+
PyObject *mapping, const char *errors)
14451431
1446-
Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
1447-
and sequences work well. Unmapped character ordinals (ones which cause a
1448-
:exc:`LookupError`) are left untouched and are copied as-is.
1432+
Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
1433+
*mapping* object and return the result as a bytes object. Return *NULL* if
1434+
an exception was raised by the codec.
14491435
14501436
.. deprecated-removed:: 3.3 4.0
14511437
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1452-
:c:func:`PyUnicode_Translate`. or :ref:`generic codec based API
1453-
<codec-registry>`
1438+
:c:func:`PyUnicode_AsCharmapString` or
1439+
:c:func:`PyUnicode_AsEncodedString`.
14541440
14551441
1456-
.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1442+
The following codec API is special in that maps Unicode to Unicode.
1443+
1444+
.. c:function:: PyObject* PyUnicode_Translate(PyObject *unicode, \
14571445
PyObject *mapping, const char *errors)
14581446
1459-
Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
1460-
*mapping* object and return a Python string object. Return *NULL* if an
1461-
exception was raised by the codec.
1447+
Translate a Unicode object using the given *mapping* object and return the
1448+
resulting Unicode object. Return *NULL* if an exception was raised by the
1449+
codec.
1450+
1451+
The *mapping* object must map Unicode ordinal integers to Unicode strings,
1452+
integers (which are then interpreted as Unicode ordinals) or ``None``
1453+
(causing deletion of the character). Unmapped character ordinals (ones
1454+
which cause a :exc:`LookupError`) are left untouched and are copied as-is.
1455+
1456+
1457+
.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1458+
PyObject *mapping, const char *errors)
1459+
1460+
Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
1461+
character *mapping* table to it and return the resulting Unicode object.
1462+
Return *NULL* when an exception was raised by the codec.
14621463
14631464
.. deprecated-removed:: 3.3 4.0
14641465
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1465-
:c:func:`PyUnicode_AsCharmapString` or
1466-
:c:func:`PyUnicode_AsEncodedString`.
1466+
:c:func:`PyUnicode_Translate`. or :ref:`generic codec based API
1467+
<codec-registry>`
14671468
14681469
14691470
MBCS codecs for Windows

Include/unicodeobject.h

+18-27
Original file line numberDiff line numberDiff line change
@@ -1609,50 +1609,41 @@ PyAPI_FUNC(PyObject*) PyUnicode_EncodeASCII(
16091609
16101610
This codec uses mappings to encode and decode characters.
16111611
1612-
Decoding mappings must map single string characters to single
1613-
Unicode characters, integers (which are then interpreted as Unicode
1614-
ordinals) or None (meaning "undefined mapping" and causing an
1615-
error).
1616-
1617-
Encoding mappings must map single Unicode characters to single
1618-
string characters, integers (which are then interpreted as Latin-1
1619-
ordinals) or None (meaning "undefined mapping" and causing an
1620-
error).
1621-
1622-
If a character lookup fails with a LookupError, the character is
1623-
copied as-is meaning that its ordinal value will be interpreted as
1624-
Unicode or Latin-1 ordinal resp. Because of this mappings only need
1625-
to contain those mappings which map characters to different code
1626-
points.
1612+
Decoding mappings must map byte ordinals (integers in the range from 0 to
1613+
255) to Unicode strings, integers (which are then interpreted as Unicode
1614+
ordinals) or None. Unmapped data bytes (ones which cause a LookupError)
1615+
as well as mapped to None, 0xFFFE or '\ufffe' are treated as "undefined
1616+
mapping" and cause an error.
1617+
1618+
Encoding mappings must map Unicode ordinal integers to bytes objects,
1619+
integers in the range from 0 to 255 or None. Unmapped character
1620+
ordinals (ones which cause a LookupError) as well as mapped to
1621+
None are treated as "undefined mapping" and cause an error.
16271622
16281623
*/
16291624

16301625
PyAPI_FUNC(PyObject*) PyUnicode_DecodeCharmap(
16311626
const char *string, /* Encoded string */
16321627
Py_ssize_t length, /* size of string */
1633-
PyObject *mapping, /* character mapping
1634-
(char ordinal -> unicode ordinal) */
1628+
PyObject *mapping, /* decoding mapping */
16351629
const char *errors /* error handling */
16361630
);
16371631

16381632
PyAPI_FUNC(PyObject*) PyUnicode_AsCharmapString(
16391633
PyObject *unicode, /* Unicode object */
1640-
PyObject *mapping /* character mapping
1641-
(unicode ordinal -> char ordinal) */
1634+
PyObject *mapping /* encoding mapping */
16421635
);
16431636

16441637
#ifndef Py_LIMITED_API
16451638
PyAPI_FUNC(PyObject*) PyUnicode_EncodeCharmap(
16461639
const Py_UNICODE *data, /* Unicode char buffer */
16471640
Py_ssize_t length, /* Number of Py_UNICODE chars to encode */
1648-
PyObject *mapping, /* character mapping
1649-
(unicode ordinal -> char ordinal) */
1641+
PyObject *mapping, /* encoding mapping */
16501642
const char *errors /* error handling */
16511643
);
16521644
PyAPI_FUNC(PyObject*) _PyUnicode_EncodeCharmap(
16531645
PyObject *unicode, /* Unicode object */
1654-
PyObject *mapping, /* character mapping
1655-
(unicode ordinal -> char ordinal) */
1646+
PyObject *mapping, /* encoding mapping */
16561647
const char *errors /* error handling */
16571648
);
16581649
#endif
@@ -1661,8 +1652,8 @@ PyAPI_FUNC(PyObject*) _PyUnicode_EncodeCharmap(
16611652
character mapping table to it and return the resulting Unicode
16621653
object.
16631654
1664-
The mapping table must map Unicode ordinal integers to Unicode
1665-
ordinal integers or None (causing deletion of the character).
1655+
The mapping table must map Unicode ordinal integers to Unicode strings,
1656+
Unicode ordinal integers or None (causing deletion of the character).
16661657
16671658
Mapping tables may be dictionaries or sequences. Unmapped character
16681659
ordinals (ones which cause a LookupError) are left untouched and
@@ -1960,8 +1951,8 @@ PyAPI_FUNC(PyObject*) PyUnicode_RSplit(
19601951
/* Translate a string by applying a character mapping table to it and
19611952
return the resulting Unicode object.
19621953
1963-
The mapping table must map Unicode ordinal integers to Unicode
1964-
ordinal integers or None (causing deletion of the character).
1954+
The mapping table must map Unicode ordinal integers to Unicode strings,
1955+
Unicode ordinal integers or None (causing deletion of the character).
19651956
19661957
Mapping tables may be dictionaries or sequences. Unmapped character
19671958
ordinals (ones which cause a LookupError) are left untouched and

0 commit comments

Comments
 (0)