-
-
Notifications
You must be signed in to change notification settings - Fork 159
Support for array-like and pandas-like data #470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #470 +/- ##
=======================================
Coverage 98.54% 98.54%
=======================================
Files 19 19
Lines 3360 3360
Branches 492 492
=======================================
Hits 3311 3311
Misses 26 26
Partials 23 23 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I'm hesitant to support this for fear of getting into a pattern of accommodating compatibility with niche packages. @smathot, is it too much for DataMatrix users to simply use the to-pandas features of DataMatrix right before calling Pingouin? My primary concerns are:
It seems to me that DataMatrix could deal with the one line conversion prior to calling Pingouin, or alternatively the onus could be on the DataMatrix code base to add a method to DataMatrix objects that call Pingouin and do the conversion on the DataMatrix end. What do you think @smathot ? Am I missing a complexity here that makes this change to Pingouin more critical for your users? |
Yes, I assumed that you wouldn't want to make ad-hoc exceptions. For this reason, you'll see that I avoided the patch from specifically targeting datamatrix, and instead used the DataFrame interchange protocol. This is an attempt at improving interoperability between packages that somehow represent data (not just DataMatrix), and essentially comes down to libraries implementing a I think you can safely adopt these patches, because they don't have any disadvantage for existing users, are not specific to one package, and they have a small benefit of making it easier for other data packages to work with Pingouin |
Oh cool, thanks for the explanation and link @smathot, I wasn't aware of this interchange protocol. I see how this could be useful. It's hard for me to tell from that link how actively adopted this is. But some quick searching makes me skeptical that this has a lot of support these days. Again, please correct me! Sounds like a cool idea and I'd be happy to see any links to active projects that are developing this protocol, and especially some major projects that are actively utilizing it. Maybe the repo was transfered somewhere else or something and I'm looking at the wrong thing. But here's what I found:
If I'm wrong and this is an actively-supported thing and the community thinks this is a good idea for Pingouin, I would still recommend:
|
Reading this, I think your assessment is regarding the DataFrame protocol is probably correct, and that not many libraries except for DataFrame and DataMatrix are going to implement a When it comes to checking for It's up to you! I would of course prefer this patch to be merged, because DataMatrix benefits from it. If you agree, then I can also add the polish you suggest. But if not, then no hard feelings! |
Okay, that makes sense. Thanks for the update. I would still vote no on But this decision is most definitely not up to only me! I have a minor voice here and we will have to wait for others to chime in before moving forward. |
Hi there, Thanks for the PR and detailed explanation @smathot. I tend to agree with @remrama that I don't see a clear benefit of adding this to Pingouin. Would the conversion that @remrama mentioned be feasible on your side?
Thank you both, |
Sure, it's possible. But it means that I have to instruct users with something like: "You can pass a DataMatrix to any function that accepts a DataFrame, except for the following pingouin functions: …". And the same is true for array-like objects like tensors. I don't fully understand the hesitation, to be honest! After all, this is a safe PR with no disadvantages, no meaningful decrease in readability, and only minor advantages. But it's 100% your call! Feel free to close the PR if it doesn't work for you. |
This is a small patch that does not change any functionality or fixes issues, but rather makes Pingouin more flexible in the types of data that it accepts. Specifically, right now Pingouin does not (always) accept data that is compatible with
numpy.ndarray
orpandas.DataFrame
when it is not exactly of that type. This patch adds some flexibility by checking whether data can be converted to anndarray
orDataFrame
when it is not of those types. It seems that data-type checking happens at different places throughout the code, so I'm not sure I caught everything.For me, the main reason to submit this PR is to make DataMatrix 2.0, which is close to fully compatible with DataFrame, also fully compatible with Pingouin. With this patch it is. The DataMatrix unit tests for now still fail on GitHub, but that's because it's pulling the latest stable version of Pingouin.
All Pingouin unittests pass with this patch.