-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Expose Python interface for other rust applications #1325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @jg2562 what would you like to do, so that I have a bit more of an understanding what is possible. |
Hi @ritchie46, thanks for the reply. We are working on an application where the core is written in rust. We use Python to call functions in rust (as most the legacy code is written in Python) and we also use python for quick proof of concepts before finalizing it in rust. For a more concrete example, we are using serde on a struct containing a DataFrame combined with zstd to create a compressed version of our data (which is nonhomogamous in terms of data types). Since rust is loading the data, we currently need to unpack the data from the dataframe into structs which can be passed back to Python. I was wondering if there was a way to expose the Python interface as a rust library to allow for us to simply pass the DataFrame to Python directly. It seems like other libraries that are written in rust for Python that want to build off of polars will also run into this issue, so it could help them too! |
The easiest thing to do is using arrow and pyarrow to communicate the memory. Then those arrow arrays can be used to create polars dataframes/series in python polars as well as rust polars. This will mostly be zero copy. Here is the code polars uses to communicate between pyarrow/rust-arrow: https://github.com/pola-rs/polars/tree/master/py-polars/src/arrow_interop |
Thank you so much! I will definitely look into that. Just out of curiousity, is there something that makes exposing the interface difficult? |
Well.. TBH, I don't really know what exposing the interface means? Do you mean compiler rust agains python polars? Or interact with a precompiled rust binary? Or using rust polars and send a dataframe to a python polars process? |
Thats fair, its pretty vague. I was imagining the last one of having rust polars and sending a dataframe to the python polars processes when I said exposing the interface. |
In that case you should use pyo3 and some copy pasting of the code snippets I referenced. That should work! |
Hey @ritchie46! I ended up working on a different project for a bit but I finally got around to making a small example. I was able to get the snippets to work, so at least i can better show an example of what I was thinking and why I was wondering if the PyDataFrame could be exposed. Here is the repo, the use case would be running the example.py but you can see that there was a lot of scripting just to emulate passing the dataframe back and forth across the ffi boundry. Lemme know what you think, and thank you so much for the direction and help! |
Not sure if this is related. I am looking to reuse PyDataFrame in my own library built with pyo3. Is the arrow conversion as @jg2562 did the best way to do it or is there something easier/more direct? Thank you. I would like to do something like this: use pyo3::prelude::*;
#[pyfunction]
fn read_my_format() -> PyResult<PyDataFrame> {
Ok(read_my_format_into_polars_df("my_file"))
}
#[pymodule]
fn my_lib(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(read_my_format, m)?)?;
Ok(())
} |
@MarcoLugo, After a recent update the repo that I posted breaks if you try to use some data types (DateTime64 for example). I think it would still be valuable to have access to the PyDataFrame if that's doable, since it will be properly tied to the library and isn't a hack on top of it. However, I really do not know how difficult this is, and so we should consult more with @ritchie46 since he would know much more. |
@ritchie46 I wrote a code that converts rust dataframe to python polars dataframe
but this takes too much time. I guess there is much easier and faster way to convert rust dataframe to python dataframe, because python dataframe is just a wrapper of rust dataframe But i don't know how to implement this job. Could you help me? If it is possible to import py-polars in rust, it will be easy to implement idea above
) |
Hello, I was casually looking into this and just wanted to share some insight with @gunjunlee py-polars uses I don't think think there is any reasonable alternative to using arrow and pyarrow |
I've seen this issue pop up a few times in the last few days (#4264, #4212, kinda #1830). I wanted to reopen discussion to talk about creating an api that is tied the polars development for people to link against. While the current example is very works and is very helpful, it is something that has to be reimplemented in every code base making it not very ergonomic to use. It also isn't tied to development of polars since its being reimplemented, so it falls out of sync and breaks during updates in different peoples projects. @ritchie46 mentioned he was considering making an api in #4212 if he had time, if you would like help with creating it please let us know! |
The way I've done this for my projects is to split up the python content into multiple crates. For example, I have a In this case, we could keep |
To me, thats exactly the right direction to go! Just separating them and allowing access to |
@ritchie46 Do you think this is the correct approach? |
I'm working on this here: https://github.com/jmrgibson/polars/tree/user/jgibson/split_out_py_polars_as_rust_crate It appears to work using the nightly compiler. Looks like newer polars relies on simd which is nightly only? I'll continue to investigate, I'd like to get this working on stable. For example, the following code works: use py_polars_core::PyDataFrame;
let time: Series = time_ns.into_iter().collect();
let df = Dataframe::new(
vec![data.clone(), time]
);
let df = PyDataFrame {
df
};
let args = (df,);
let res = Python::with_gil(|py| -> PyResult<DataFrame> {
let res = pyfun c.call1(py, args)?;
let pdf = res.extract::<PyDataFrame>(py)?;
Ok(pdf.df)
}); |
I don't think we should shop the python interface for that. We could use arrows c interface for that. That is zero copy and much slimmer. |
I don't think I understand enough about pyo3 to figure out where the copying is happening this case. E.g. If I want to call a python function with a dataframe I create in rust, and get a dataframe back to rust: # module.py
def manipulate_df(df: pl.DataFrame) -> pl.DataFrame:
... # user writes manipulation function here fn main(){
let df = df!(
"data" => [1.0, 2.0],
"time" => [1.0, 2.0],
);
let modified_df = Python::with_gil(|py| {
let module = PyModule::import(py, "module")?;
let pydf: PyDataFrame = df.into();
let args = (pydf,);
let result: PyDataFrame = builtins.getattr("manipulate_df")?.call1(args)?.extract()?;
Ok(result.df)
})?;
} Based on the docs for Py::new, which is what the default |
@ritchie46 , do you think it's possible to conver LazyFrame from Python to Rust and back like you did here with Eager frame? |
You'd need to serialize the query plan. This will copy data if you use |
I think this is a good suggestion for something to make the python interface easier for third party bindings. The example code in the |
I have a setup of a crate that does this for you hidden behind pyo3 bindings. But haven't yet had the bandwidth/priority to finish this. |
@ritchie46 that would be really useful, especially for types beyond Series/DataFrame (like LazyFrame). I can try helping (although I am still abit of a noob) |
I just want to echo that a succinct example of how to create a PyDataFrame in a new Rust project and pass it back into Python code would be very helpful to me and @andyjslee |
@ritchie46 mentioned on discord: https://github.com/pola-rs/pyo3-polars |
Yes, this is the way to go. |
Thanks, the pyo3-polars crate is exactly what I was looking for! |
Currently the python-rust interface is within py-polars and is only published to pypi. It would be helpful for other applications that need to pass dataframes over that inferface to have access to the Pyo3 wrapper type.
Is there any way to faciliate have access to the wrapper type to return a dataframe to python using pyo3?
The text was updated successfully, but these errors were encountered: