Skip to content

gh-125063: marshal: Add version 5, improve documentation #126829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions Doc/c-api/marshal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,12 @@ binary mode.

Numeric values are stored with the least significant byte first.

The module supports two versions of the data format: version 0 is the
historical version, version 1 shares interned strings in the file, and upon
unmarshalling. Version 2 uses a binary format for floating-point numbers.
``Py_MARSHAL_VERSION`` indicates the current file format (currently 2).
The module supports several versions of the data format; see
the :py:mod:`Python module documentation <marshal>` for details.

.. c:macro:: Py_MARSHAL_VERSION

The current format version. See :py:data:`marshal.version`.

.. c:function:: void PyMarshal_WriteLongToFile(long value, FILE *file, int version)

Expand Down
64 changes: 46 additions & 18 deletions Doc/library/marshal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,23 +38,39 @@ supports a substantially wider range of objects than marshal.
maliciously constructed data. Never unmarshal data received from an
untrusted or unauthenticated source.

There are functions that read/write files as well as functions operating on
bytes-like objects.

.. index:: object; code, code object

Not all Python object types are supported; in general, only objects whose value
is independent from a particular invocation of Python can be written and read by
this module. The following types are supported: booleans, integers, floating-point
numbers, complex numbers, strings, bytes, bytearrays, tuples, lists, sets,
frozensets, dictionaries, and code objects (if *allow_code* is true),
where it should be understood that
tuples, lists, sets, frozensets and dictionaries are only supported as long as
the values contained therein are themselves supported. The
singletons :const:`None`, :const:`Ellipsis` and :exc:`StopIteration` can also be
marshalled and unmarshalled.
For format *version* lower than 3, recursive lists, sets and dictionaries cannot
be written (see below).
this module. The following types are supported:

* Numeric types: :class:`int`, :class:`bool`, :class:`float`, :class:`complex`.
* Strings (:class:`str`) and :class:`bytes`.
:term:`Bytes-like objects <bytes-like object>` like :class:`bytearray` are
marshalled as :class:`!bytes`.
* Containers: :class:`tuple`, :class:`list`, :class:`set`, :class:`frozenset`,
and (since :data:`version` 5), :class:`slice`.
It should be understood that these are supported only if the values contained
therein are themselves supported.
Recursive containers are supported since :data:`version` 3.
* The singletons :const:`None`, :const:`Ellipsis` and :exc:`StopIteration`.
* :class:`code` objects, if *allow_code* is true. See note above about
version dependence.

.. versionchanged:: 3.4

* Added format version 3, which supports marshalling recursive lists, sets
and dictionaries.
* Added format version 4, which supports efficient representations
of short strings.

.. versionchanged:: next

Added format version 5, which allows marshalling slices.

There are functions that read/write files as well as functions operating on
bytes-like objects.

The module defines these functions:

Expand Down Expand Up @@ -140,11 +156,24 @@ In addition, the following constants are defined:

.. data:: version

Indicates the format that the module uses. Version 0 is the historical
format, version 1 shares interned strings and version 2 uses a binary format
for floating-point numbers.
Version 3 adds support for object instancing and recursion.
The current version is 4.
Indicates the format that the module uses.
Version 0 is the historical first version; subsequent versions
add new features.
Generally, a new version becomes the default when it is introduced.

======= =============== ====================================================
Version Available since New features
======= =============== ====================================================
1 Python 2.4 Sharing interned strings
------- --------------- ----------------------------------------------------
2 Python 2.5 Binary representation of floats
------- --------------- ----------------------------------------------------
3 Python 3.4 Support for object instancing and recursion
------- --------------- ----------------------------------------------------
4 Python 3.4 Efficient representation of short strings
------- --------------- ----------------------------------------------------
5 Python 3.14 Support for :class:`slice` objects
======= =============== ====================================================


.. rubric:: Footnotes
Expand All @@ -154,4 +183,3 @@ In addition, the following constants are defined:
around in a self-contained form. Strictly speaking, "to marshal" means to
convert some data from internal to external form (in an RPC buffer for instance)
and "unmarshalling" for the reverse process.

2 changes: 1 addition & 1 deletion Include/marshal.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ PyAPI_FUNC(PyObject *) PyMarshal_ReadObjectFromString(const char *,
Py_ssize_t);
PyAPI_FUNC(PyObject *) PyMarshal_WriteObjectToString(PyObject *, int);

#define Py_MARSHAL_VERSION 4
#define Py_MARSHAL_VERSION 5

PyAPI_FUNC(long) PyMarshal_ReadLongFromFile(FILE *);
PyAPI_FUNC(int) PyMarshal_ReadShortFromFile(FILE *);
Expand Down
27 changes: 24 additions & 3 deletions Lib/test/test_marshal.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,13 @@ def helper(self, sample, *extra):
finally:
os_helper.unlink(os_helper.TESTFN)

def omit_last_byte(data):
"""return data[:-1]"""
# This file's code is used in CompatibilityTestCase,
# but slices need marshal version 5.
# Avoid the slice literal.
return data[slice(0, -1)]

class IntTestCase(unittest.TestCase, HelperMixin):
def test_ints(self):
# Test a range of Python ints larger than the machine word size.
Expand Down Expand Up @@ -241,7 +248,8 @@ def test_bug_5888452(self):
def test_patch_873224(self):
self.assertRaises(Exception, marshal.loads, b'0')
self.assertRaises(Exception, marshal.loads, b'f')
self.assertRaises(Exception, marshal.loads, marshal.dumps(2**65)[:-1])
self.assertRaises(Exception, marshal.loads,
omit_last_byte(marshal.dumps(2**65)))

def test_version_argument(self):
# Python 2.4.0 crashes for any call to marshal.dumps(x, y)
Expand Down Expand Up @@ -594,6 +602,19 @@ def testNoIntern(self):
s2 = sys.intern(s)
self.assertNotEqual(id(s2), id(s))

class SliceTestCase(unittest.TestCase, HelperMixin):
def test_slice(self):
for obj in (
slice(None), slice(1), slice(1, 2), slice(1, 2, 3),
slice({'set'}, ('tuple', {'with': 'dict'}, ), self.helper.__code__)
):
with self.subTest(obj=str(obj)):
self.helper(obj)

for version in range(4):
with self.assertRaises(ValueError):
marshal.dumps(obj, version)

@support.cpython_only
@unittest.skipUnless(_testcapi, 'requires _testcapi')
class CAPI_TestCase(unittest.TestCase, HelperMixin):
Expand Down Expand Up @@ -654,7 +675,7 @@ def test_read_last_object_from_file(self):
self.assertEqual(r, obj)

with open(os_helper.TESTFN, 'wb') as f:
f.write(data[:1])
f.write(omit_last_byte(data))
with self.assertRaises(EOFError):
_testcapi.pymarshal_read_last_object_from_file(os_helper.TESTFN)
os_helper.unlink(os_helper.TESTFN)
Expand All @@ -671,7 +692,7 @@ def test_read_object_from_file(self):
self.assertEqual(p, len(data))

with open(os_helper.TESTFN, 'wb') as f:
f.write(data[:1])
f.write(omit_last_byte(data))
with self.assertRaises(EOFError):
_testcapi.pymarshal_read_object_from_file(os_helper.TESTFN)
os_helper.unlink(os_helper.TESTFN)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
:mod:`marshal` now supports :class:`slice` objects. The marshal format
version was increased to 5.
1 change: 1 addition & 0 deletions Programs/_freeze_module.c
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ compile_and_marshal(const char *name, const char *text)
return NULL;
}

assert(Py_MARSHAL_VERSION >= 5);
PyObject *marshalled = PyMarshal_WriteObjectToString(code, Py_MARSHAL_VERSION);
Comment on lines +124 to 125
Copy link
Member Author

@encukou encukou Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Makefile, _freeze_module doesn't depend on marshal.h, but it needs to be rebuilt when Py_MARSHAL_VERSION switches to 5.
Instead of properly changing the Makefile, which would be another rabbit hole, I decided to touch _freeze_module.c instead.

Py_CLEAR(code);
if (marshalled == NULL) {
Expand Down
46 changes: 31 additions & 15 deletions Python/marshal.c
Original file line number Diff line number Diff line change
Expand Up @@ -50,41 +50,52 @@ module marshal
# define MAX_MARSHAL_STACK_DEPTH 2000
#endif

/* Supported types */
#define TYPE_NULL '0'
#define TYPE_NONE 'N'
#define TYPE_FALSE 'F'
#define TYPE_TRUE 'T'
#define TYPE_STOPITER 'S'
#define TYPE_ELLIPSIS '.'
#define TYPE_INT 'i'
/* TYPE_INT64 is not generated anymore.
Supported for backward compatibility only. */
#define TYPE_INT64 'I'
#define TYPE_FLOAT 'f'
#define TYPE_BINARY_FLOAT 'g'
#define TYPE_COMPLEX 'x'
#define TYPE_BINARY_COMPLEX 'y'
#define TYPE_LONG 'l'
#define TYPE_STRING 's'
#define TYPE_INTERNED 't'
#define TYPE_REF 'r'
#define TYPE_TUPLE '('
#define TYPE_BINARY_FLOAT 'g' // Version 0 uses TYPE_FLOAT instead.
#define TYPE_BINARY_COMPLEX 'y' // Version 0 uses TYPE_COMPLEX instead.
#define TYPE_LONG 'l' // See also TYPE_INT.
#define TYPE_STRING 's' // Bytes. (Name comes from Python 2.)
#define TYPE_TUPLE '(' // See also TYPE_SMALL_TUPLE.
#define TYPE_LIST '['
#define TYPE_DICT '{'
#define TYPE_CODE 'c'
#define TYPE_UNICODE 'u'
#define TYPE_UNKNOWN '?'
// added in version 2:
#define TYPE_SET '<'
#define TYPE_FROZENSET '>'
// added in version 5:
#define TYPE_SLICE ':'
#define FLAG_REF '\x80' /* with a type, add obj to index */
// Remember to update the version and documentation when adding new types.

/* Special cases for unicode strings (added in version 4) */
#define TYPE_INTERNED 't' // Version 1+
#define TYPE_ASCII 'a'
#define TYPE_ASCII_INTERNED 'A'
#define TYPE_SMALL_TUPLE ')'
#define TYPE_SHORT_ASCII 'z'
#define TYPE_SHORT_ASCII_INTERNED 'Z'

/* Special cases for small objects */
#define TYPE_INT 'i' // All versions. 32-bit encoding.
#define TYPE_SMALL_TUPLE ')' // Version 4+

/* Supported for backwards compatibility */
#define TYPE_COMPLEX 'x' // Generated for version 0 only.
#define TYPE_FLOAT 'f' // Generated for version 0 only.
#define TYPE_INT64 'I' // Not generated any more.

/* References (added in version 3) */
#define TYPE_REF 'r'
#define FLAG_REF '\x80' /* with a type, add obj to index */


// Error codes:
#define WFERR_OK 0
#define WFERR_UNMARSHALLABLE 1
#define WFERR_NESTEDTOODEEP 2
Expand Down Expand Up @@ -615,6 +626,11 @@ w_complex_object(PyObject *v, char flag, WFILE *p)
PyBuffer_Release(&view);
}
else if (PySlice_Check(v)) {
if (p->version < 5) {
w_byte(TYPE_UNKNOWN, p);
p->error = WFERR_UNMARSHALLABLE;
return;
}
PySliceObject *slice = (PySliceObject *)v;
W_TYPE(TYPE_SLICE, p);
w_object(slice->start, p);
Expand Down
Loading