Skip to content

Commit d98b5bd

Browse files
encukoumdboom
authored andcommitted
pythongh-125063: marshal: Add version 5, improve documentation (pythonGH-126829)
* Document that slices can be marshalled * Deduplicate and organize the list of supported types in docs * Organize the type code list in marshal.c, to make it more obvious that this is a versioned format * Back-fill some historical info Co-authored-by: Michael Droettboom <[email protected]>
1 parent c08e913 commit d98b5bd

File tree

7 files changed

+110
-41
lines changed

7 files changed

+110
-41
lines changed

Doc/c-api/marshal.rst

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,12 @@ binary mode.
1313

1414
Numeric values are stored with the least significant byte first.
1515

16-
The module supports two versions of the data format: version 0 is the
17-
historical version, version 1 shares interned strings in the file, and upon
18-
unmarshalling. Version 2 uses a binary format for floating-point numbers.
19-
``Py_MARSHAL_VERSION`` indicates the current file format (currently 2).
16+
The module supports several versions of the data format; see
17+
the :py:mod:`Python module documentation <marshal>` for details.
2018

19+
.. c:macro:: Py_MARSHAL_VERSION
20+
21+
The current format version. See :py:data:`marshal.version`.
2122

2223
.. c:function:: void PyMarshal_WriteLongToFile(long value, FILE *file, int version)
2324

Doc/library/marshal.rst

Lines changed: 46 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -38,23 +38,39 @@ supports a substantially wider range of objects than marshal.
3838
maliciously constructed data. Never unmarshal data received from an
3939
untrusted or unauthenticated source.
4040

41+
There are functions that read/write files as well as functions operating on
42+
bytes-like objects.
43+
4144
.. index:: object; code, code object
4245

4346
Not all Python object types are supported; in general, only objects whose value
4447
is independent from a particular invocation of Python can be written and read by
45-
this module. The following types are supported: booleans, integers, floating-point
46-
numbers, complex numbers, strings, bytes, bytearrays, tuples, lists, sets,
47-
frozensets, dictionaries, and code objects (if *allow_code* is true),
48-
where it should be understood that
49-
tuples, lists, sets, frozensets and dictionaries are only supported as long as
50-
the values contained therein are themselves supported. The
51-
singletons :const:`None`, :const:`Ellipsis` and :exc:`StopIteration` can also be
52-
marshalled and unmarshalled.
53-
For format *version* lower than 3, recursive lists, sets and dictionaries cannot
54-
be written (see below).
48+
this module. The following types are supported:
49+
50+
* Numeric types: :class:`int`, :class:`bool`, :class:`float`, :class:`complex`.
51+
* Strings (:class:`str`) and :class:`bytes`.
52+
:term:`Bytes-like objects <bytes-like object>` like :class:`bytearray` are
53+
marshalled as :class:`!bytes`.
54+
* Containers: :class:`tuple`, :class:`list`, :class:`set`, :class:`frozenset`,
55+
and (since :data:`version` 5), :class:`slice`.
56+
It should be understood that these are supported only if the values contained
57+
therein are themselves supported.
58+
Recursive containers are supported since :data:`version` 3.
59+
* The singletons :const:`None`, :const:`Ellipsis` and :exc:`StopIteration`.
60+
* :class:`code` objects, if *allow_code* is true. See note above about
61+
version dependence.
62+
63+
.. versionchanged:: 3.4
64+
65+
* Added format version 3, which supports marshalling recursive lists, sets
66+
and dictionaries.
67+
* Added format version 4, which supports efficient representations
68+
of short strings.
69+
70+
.. versionchanged:: next
71+
72+
Added format version 5, which allows marshalling slices.
5573

56-
There are functions that read/write files as well as functions operating on
57-
bytes-like objects.
5874

5975
The module defines these functions:
6076

@@ -140,11 +156,24 @@ In addition, the following constants are defined:
140156

141157
.. data:: version
142158

143-
Indicates the format that the module uses. Version 0 is the historical
144-
format, version 1 shares interned strings and version 2 uses a binary format
145-
for floating-point numbers.
146-
Version 3 adds support for object instancing and recursion.
147-
The current version is 4.
159+
Indicates the format that the module uses.
160+
Version 0 is the historical first version; subsequent versions
161+
add new features.
162+
Generally, a new version becomes the default when it is introduced.
163+
164+
======= =============== ====================================================
165+
Version Available since New features
166+
======= =============== ====================================================
167+
1 Python 2.4 Sharing interned strings
168+
------- --------------- ----------------------------------------------------
169+
2 Python 2.5 Binary representation of floats
170+
------- --------------- ----------------------------------------------------
171+
3 Python 3.4 Support for object instancing and recursion
172+
------- --------------- ----------------------------------------------------
173+
4 Python 3.4 Efficient representation of short strings
174+
------- --------------- ----------------------------------------------------
175+
5 Python 3.14 Support for :class:`slice` objects
176+
======= =============== ====================================================
148177

149178

150179
.. rubric:: Footnotes
@@ -154,4 +183,3 @@ In addition, the following constants are defined:
154183
around in a self-contained form. Strictly speaking, "to marshal" means to
155184
convert some data from internal to external form (in an RPC buffer for instance)
156185
and "unmarshalling" for the reverse process.
157-

Include/marshal.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ PyAPI_FUNC(PyObject *) PyMarshal_ReadObjectFromString(const char *,
1313
Py_ssize_t);
1414
PyAPI_FUNC(PyObject *) PyMarshal_WriteObjectToString(PyObject *, int);
1515

16-
#define Py_MARSHAL_VERSION 4
16+
#define Py_MARSHAL_VERSION 5
1717

1818
PyAPI_FUNC(long) PyMarshal_ReadLongFromFile(FILE *);
1919
PyAPI_FUNC(int) PyMarshal_ReadShortFromFile(FILE *);

Lib/test/test_marshal.py

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,13 @@ def helper(self, sample, *extra):
2828
finally:
2929
os_helper.unlink(os_helper.TESTFN)
3030

31+
def omit_last_byte(data):
32+
"""return data[:-1]"""
33+
# This file's code is used in CompatibilityTestCase,
34+
# but slices need marshal version 5.
35+
# Avoid the slice literal.
36+
return data[slice(0, -1)]
37+
3138
class IntTestCase(unittest.TestCase, HelperMixin):
3239
def test_ints(self):
3340
# Test a range of Python ints larger than the machine word size.
@@ -241,7 +248,8 @@ def test_bug_5888452(self):
241248
def test_patch_873224(self):
242249
self.assertRaises(Exception, marshal.loads, b'0')
243250
self.assertRaises(Exception, marshal.loads, b'f')
244-
self.assertRaises(Exception, marshal.loads, marshal.dumps(2**65)[:-1])
251+
self.assertRaises(Exception, marshal.loads,
252+
omit_last_byte(marshal.dumps(2**65)))
245253

246254
def test_version_argument(self):
247255
# Python 2.4.0 crashes for any call to marshal.dumps(x, y)
@@ -594,6 +602,19 @@ def testNoIntern(self):
594602
s2 = sys.intern(s)
595603
self.assertNotEqual(id(s2), id(s))
596604

605+
class SliceTestCase(unittest.TestCase, HelperMixin):
606+
def test_slice(self):
607+
for obj in (
608+
slice(None), slice(1), slice(1, 2), slice(1, 2, 3),
609+
slice({'set'}, ('tuple', {'with': 'dict'}, ), self.helper.__code__)
610+
):
611+
with self.subTest(obj=str(obj)):
612+
self.helper(obj)
613+
614+
for version in range(4):
615+
with self.assertRaises(ValueError):
616+
marshal.dumps(obj, version)
617+
597618
@support.cpython_only
598619
@unittest.skipUnless(_testcapi, 'requires _testcapi')
599620
class CAPI_TestCase(unittest.TestCase, HelperMixin):
@@ -654,7 +675,7 @@ def test_read_last_object_from_file(self):
654675
self.assertEqual(r, obj)
655676

656677
with open(os_helper.TESTFN, 'wb') as f:
657-
f.write(data[:1])
678+
f.write(omit_last_byte(data))
658679
with self.assertRaises(EOFError):
659680
_testcapi.pymarshal_read_last_object_from_file(os_helper.TESTFN)
660681
os_helper.unlink(os_helper.TESTFN)
@@ -671,7 +692,7 @@ def test_read_object_from_file(self):
671692
self.assertEqual(p, len(data))
672693

673694
with open(os_helper.TESTFN, 'wb') as f:
674-
f.write(data[:1])
695+
f.write(omit_last_byte(data))
675696
with self.assertRaises(EOFError):
676697
_testcapi.pymarshal_read_object_from_file(os_helper.TESTFN)
677698
os_helper.unlink(os_helper.TESTFN)
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
:mod:`marshal` now supports :class:`slice` objects. The marshal format
2+
version was increased to 5.

Programs/_freeze_module.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,7 @@ compile_and_marshal(const char *name, const char *text)
121121
return NULL;
122122
}
123123

124+
assert(Py_MARSHAL_VERSION >= 5);
124125
PyObject *marshalled = PyMarshal_WriteObjectToString(code, Py_MARSHAL_VERSION);
125126
Py_CLEAR(code);
126127
if (marshalled == NULL) {

Python/marshal.c

Lines changed: 31 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -50,41 +50,52 @@ module marshal
5050
# define MAX_MARSHAL_STACK_DEPTH 2000
5151
#endif
5252

53+
/* Supported types */
5354
#define TYPE_NULL '0'
5455
#define TYPE_NONE 'N'
5556
#define TYPE_FALSE 'F'
5657
#define TYPE_TRUE 'T'
5758
#define TYPE_STOPITER 'S'
5859
#define TYPE_ELLIPSIS '.'
59-
#define TYPE_INT 'i'
60-
/* TYPE_INT64 is not generated anymore.
61-
Supported for backward compatibility only. */
62-
#define TYPE_INT64 'I'
63-
#define TYPE_FLOAT 'f'
64-
#define TYPE_BINARY_FLOAT 'g'
65-
#define TYPE_COMPLEX 'x'
66-
#define TYPE_BINARY_COMPLEX 'y'
67-
#define TYPE_LONG 'l'
68-
#define TYPE_STRING 's'
69-
#define TYPE_INTERNED 't'
70-
#define TYPE_REF 'r'
71-
#define TYPE_TUPLE '('
60+
#define TYPE_BINARY_FLOAT 'g' // Version 0 uses TYPE_FLOAT instead.
61+
#define TYPE_BINARY_COMPLEX 'y' // Version 0 uses TYPE_COMPLEX instead.
62+
#define TYPE_LONG 'l' // See also TYPE_INT.
63+
#define TYPE_STRING 's' // Bytes. (Name comes from Python 2.)
64+
#define TYPE_TUPLE '(' // See also TYPE_SMALL_TUPLE.
7265
#define TYPE_LIST '['
7366
#define TYPE_DICT '{'
7467
#define TYPE_CODE 'c'
7568
#define TYPE_UNICODE 'u'
7669
#define TYPE_UNKNOWN '?'
70+
// added in version 2:
7771
#define TYPE_SET '<'
7872
#define TYPE_FROZENSET '>'
73+
// added in version 5:
7974
#define TYPE_SLICE ':'
80-
#define FLAG_REF '\x80' /* with a type, add obj to index */
75+
// Remember to update the version and documentation when adding new types.
8176

77+
/* Special cases for unicode strings (added in version 4) */
78+
#define TYPE_INTERNED 't' // Version 1+
8279
#define TYPE_ASCII 'a'
8380
#define TYPE_ASCII_INTERNED 'A'
84-
#define TYPE_SMALL_TUPLE ')'
8581
#define TYPE_SHORT_ASCII 'z'
8682
#define TYPE_SHORT_ASCII_INTERNED 'Z'
8783

84+
/* Special cases for small objects */
85+
#define TYPE_INT 'i' // All versions. 32-bit encoding.
86+
#define TYPE_SMALL_TUPLE ')' // Version 4+
87+
88+
/* Supported for backwards compatibility */
89+
#define TYPE_COMPLEX 'x' // Generated for version 0 only.
90+
#define TYPE_FLOAT 'f' // Generated for version 0 only.
91+
#define TYPE_INT64 'I' // Not generated any more.
92+
93+
/* References (added in version 3) */
94+
#define TYPE_REF 'r'
95+
#define FLAG_REF '\x80' /* with a type, add obj to index */
96+
97+
98+
// Error codes:
8899
#define WFERR_OK 0
89100
#define WFERR_UNMARSHALLABLE 1
90101
#define WFERR_NESTEDTOODEEP 2
@@ -615,6 +626,11 @@ w_complex_object(PyObject *v, char flag, WFILE *p)
615626
PyBuffer_Release(&view);
616627
}
617628
else if (PySlice_Check(v)) {
629+
if (p->version < 5) {
630+
w_byte(TYPE_UNKNOWN, p);
631+
p->error = WFERR_UNMARSHALLABLE;
632+
return;
633+
}
618634
PySliceObject *slice = (PySliceObject *)v;
619635
W_TYPE(TYPE_SLICE, p);
620636
w_object(slice->start, p);

0 commit comments

Comments
 (0)