ARROW-3903: [Python] Random array generator for Arrow conversion and Parquet testing #3301

kszucs · 2019-01-03T17:48:15Z

Generate random schemas, arrays, chunked_arrays, columns, record_batches and tables.
Slow, but makes quiet easy to isolate corner cases (already created jira issues). In follow up PRs We should use these strategies to increase the coverage. It'll enable us to reduce the issues, We could even use it for generate benchmark datasets periodically (only if We persist somewhere).

Example usage:

Run 10 samples (dev profile):
pytest -sv pyarrow/tests/test_strategies.py::test_tables --enable-hypothesis --hypothesis-show-statistics --hypothesis-profile=dev

Print the generated examples (debug):
pytest -sv pyarrow/tests/test_strategies.py::test_schemas --enable-hypothesis --hypothesis-show-statistics --hypothesis-profile=debug

…tegies themselves

kszucs · 2019-01-30T20:20:18Z

python/pyarrow/tests/strategies.py

+    if isinstance(type, st.SearchStrategy):
+        type = draw(type)
+
+    # TODO(kszucs): remove it, field metadata is not kept


The type equality check fails at https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L297
We should probably use .equals(check_metadata=False) and find out why the two metadata are different.

I didn't file a jira issue because I couldn't create a reproducible example - the metadata is not displayed. However commenting out the assume line reproduces the issue.

xhochy

+1, LGTM.

@fjetter you will like this!

xhochy · 2019-02-01T06:53:43Z

python/pyarrow/table.pxi

@@ -1155,9 +1155,9 @@ cdef class Table(_PandasConvertible):

        Parameters
        ----------
-        arrays: list of pyarrow.Array or pyarrow.Column
+        arrays : list of pyarrow.Array or pyarrow.Column


This is actually not needed in the latest numpydoc spec. But for docs improvement, we could probably build one day on pandas' work: pandas-dev/pandas#22408

xhochy · 2019-02-01T06:55:17Z

python/pyarrow/tests/test_array.py

@@ -32,6 +34,7 @@
    pickle5 = None

 import pyarrow as pa
+import pyarrow.tests.strategies as past


Nice word pun 😂

…Parquet testing Generate random schemas, arrays, chunked_arrays, columns, record_batches and tables. Slow, but makes quiet easy to isolate corner cases (already created jira issues). In follow up PRs We should use these strategies to increase the coverage. It'll enable us to reduce the issues, We could even use it for generate benchmark datasets periodically (only if We persist somewhere). Example usage: Run 10 samples (dev profile): `pytest -sv pyarrow/tests/test_strategies.py::test_tables --enable-hypothesis --hypothesis-show-statistics --hypothesis-profile=dev` Print the generated examples (debug): `pytest -sv pyarrow/tests/test_strategies.py::test_schemas --enable-hypothesis --hypothesis-show-statistics --hypothesis-profile=debug` Author: Krisztián Szűcs <[email protected]> Closes #3301 from kszucs/ARROW-3903 and squashes the following commits: ff6654c <Krisztián Szűcs> finalize 8b5e7ea <Krisztián Szűcs> rat 61fe01d <Krisztián Szűcs> strategies for chunked_arrays, columns, record batches; test the strategies themselves bdb63df <Krisztián Szűcs> hypothesis array strategy

…Parquet testing Generate random schemas, arrays, chunked_arrays, columns, record_batches and tables. Slow, but makes quiet easy to isolate corner cases (already created jira issues). In follow up PRs We should use these strategies to increase the coverage. It'll enable us to reduce the issues, We could even use it for generate benchmark datasets periodically (only if We persist somewhere). Example usage: Run 10 samples (dev profile): `pytest -sv pyarrow/tests/test_strategies.py::test_tables --enable-hypothesis --hypothesis-show-statistics --hypothesis-profile=dev` Print the generated examples (debug): `pytest -sv pyarrow/tests/test_strategies.py::test_schemas --enable-hypothesis --hypothesis-show-statistics --hypothesis-profile=debug` Author: Krisztián Szűcs <[email protected]> Closes apache#3301 from kszucs/ARROW-3903 and squashes the following commits: ff6654c <Krisztián Szűcs> finalize 8b5e7ea <Krisztián Szűcs> rat 61fe01d <Krisztián Szűcs> strategies for chunked_arrays, columns, record batches; test the strategies themselves bdb63df <Krisztián Szűcs> hypothesis array strategy

kszucs added the WIP PR is work in progress label Jan 3, 2019

kszucs force-pushed the ARROW-3903 branch from f061e6e to 27b6ac7 Compare January 4, 2019 14:40

kszucs added 3 commits January 30, 2019 18:43

hypothesis array strategy

bdb63df

strategies for chunked_arrays, columns, record batches; test the stra…

61fe01d

…tegies themselves

rat

8b5e7ea

kszucs removed the WIP PR is work in progress label Jan 30, 2019

kszucs force-pushed the ARROW-3903 branch from af05721 to 8b5e7ea Compare January 30, 2019 19:47

finalize

ff6654c

kszucs commented Jan 30, 2019

View reviewed changes

kszucs requested review from wesm and xhochy January 30, 2019 20:21

xhochy approved these changes Feb 1, 2019

View reviewed changes

kszucs removed the request for review from wesm February 7, 2019 12:35

kszucs closed this in f957b5b Feb 7, 2019

asfimport mentioned this pull request Feb 7, 2019

[Python] Random array generator for Arrow conversion and Parquet testing #20517

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-3903: [Python] Random array generator for Arrow conversion and Parquet testing #3301

ARROW-3903: [Python] Random array generator for Arrow conversion and Parquet testing #3301

Uh oh!

kszucs commented Jan 3, 2019 •

edited

Loading

Uh oh!

kszucs Jan 30, 2019

Uh oh!

xhochy left a comment

Uh oh!

xhochy Feb 1, 2019

Uh oh!

xhochy Feb 1, 2019

Uh oh!

Uh oh!

ARROW-3903: [Python] Random array generator for Arrow conversion and Parquet testing #3301

ARROW-3903: [Python] Random array generator for Arrow conversion and Parquet testing #3301

Uh oh!

Conversation

kszucs commented Jan 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kszucs Jan 30, 2019

Choose a reason for hiding this comment

Uh oh!

xhochy left a comment

Choose a reason for hiding this comment

Uh oh!

xhochy Feb 1, 2019

Choose a reason for hiding this comment

Uh oh!

xhochy Feb 1, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kszucs commented Jan 3, 2019 •

edited

Loading