Conversation
50d9237 to
229f764
Compare
cmake/modules/FindPyArrow.cmake
Outdated
There was a problem hiding this comment.
Don't hardcode the version, just use "arrow_python.dylib" etc
There was a problem hiding this comment.
We are going to have to cope with the conditional naming for these libraries, as they are not disted in this version under this name (and don't seem to exhibit any consistent version convention at all).
There was a problem hiding this comment.
Let's link this into an apache arrow issue, for pyarrow < 1 they had symlinks and we'll need those for some platforms
There was a problem hiding this comment.
@wesm this makes linking against these libraries super ugly. Since the version is managed by python anyway (e.g. in the version of the python package), wouldn't it be better to ship just the unversioned binary (e.g. libarrow.dylib)?
There was a problem hiding this comment.
See http://arrow.apache.org/docs/python/extending.html#building-extensions-against-pypi-wheels. We had to make changes to prevent shared libraries from being duplicated in the wheels. It might be possible to change things to ship unversioned shared libraries instead, but it will require some surgery on the Cython modules to get them to link with the unversioned libs (because they look for the libraries with the ABI version when they link). Feel free to open a JIRA issue in Apache Arrow with a proposal -- I have definitely hit the limit for how much time I can personally spend on it
There was a problem hiding this comment.
Thanks @wesm
@timkpaine I am not particularly bothered by this, it is easy to fix and this file must be manually updated for new Arrow versions regardless.
cpp/perspective/CMakeLists.txt
Outdated
| target_link_libraries(binding tbb) | ||
|
|
||
| target_link_libraries(binding psp) | ||
| target_compile_options(psp PRIVATE -Wno-deprecated-register) |
There was a problem hiding this comment.
Silences python2 deprecation warnings which make our build logs un-readable.
There was a problem hiding this comment.
You're doing this on windows too though, are you sure this flag will work there?
There was a problem hiding this comment.
I don't know ... our windows azure builds do not work ...
| #include <perspective/emscripten.h> | ||
| #include <perspective/arrow_loader.h> | ||
| #include <perspective/arrow_writer.h> | ||
| #include "arrow/csv/api.h" |
python/perspective/setup.py
Outdated
| requires.append("backports.shutil-which") | ||
| requires.append("pyarrow<2") | ||
| else: | ||
| requires.append("pyarrow>=1.0.1,<2") |
There was a problem hiding this comment.
this should match pyproject.toml
There was a problem hiding this comment.
This version is conditional on python version - does this not make the looser bounds "pyarrow<2" for pyproject.toml correct? Perspective under this PR will compile with either pyarrow.
There was a problem hiding this comment.
pyproject.toml specifies it as pyarrow<2 regardless of python version. So it could satisfy with pyarrow<0.16 even though that won't work.
You have a couple of options, something like:
"pyarrow>0.16.0,<2; python_version < '3'", "pyarrow>1.0.1,<2; python_version >= '3'" would work ok.
Stylistically, every place where we specify requirements should match exactly the behavior of pyproject.toml, otherwise we can very easily start to break our builds and wheels like the state we were in before.
azure-pipelines.yml
Outdated
|
|
||
| # - script: | | ||
| # python -m pip install numpy "pyarrow>=0.16.0,<1" | ||
| # python -m pip install numpy "pyarrow>=1.0.1,<2" |
azure-pipelines.yml
Outdated
| - script: | | ||
| python -m pip install delocate wheel numpy "pyarrow>=0.16.0,<1" | ||
| displayName: "Python deps" | ||
| python -m pip install delocate wheel numpy "pyarrow>=1.0.1,<2" |
|
@texodus the only issue really is that |
|
@timkpaine I've moved the |
|
@texodus do you want me to make the python changes on that branch or do you want them in a separate PR? |
|
I already made most of them - just the |
Upgrades Apache Arrow to v1.0.1 for all targets except Python 2.7. For the WebAssembly/Emscripten build, CSV loading and writing (via
to_csv()) has been rewritten to use Arrow over C++, resulting in a ~10x load performance increase, depending on whether you provide a schema and what data types your columns are.For Python 2.7, Arrow requirements are
pyarrow<2only, which is 0.16.1. I've done some limited testing to ensure this is cross-compatible with Arrow Table data generated or read-from 1.0.1, but this is something we should probably keep an eye on; especially regarding the JupyterLab plugin@finos/perspective-jupyterlab, whose client/server distributed mode would use engines compiled to both versions simultaneously when Python 2.7 runs the kernel.