Skip to content

Why not CDAT? #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DamienIrving opened this issue May 4, 2014 · 3 comments
Closed

Why not CDAT? #112

DamienIrving opened this issue May 4, 2014 · 3 comments

Comments

@DamienIrving
Copy link

In your main README file you ask the questions "Why not Pandas?" and "Why not Iris?" Another question you might want to ask is "Why not CDAT?" It was written quite a long time ago now but is still used extensively by UV-CDAT and thus by the ESGF. In particular, the cdms2, cdutil, genutil and MV2 libraries within CDAT do some of what xray does.

@shoyer
Copy link
Member

shoyer commented May 9, 2014

Thanks for asking!

Looking at CDAT, it does indeed appear that it has objects which are very similar in spirit to the xray Dataset and DataArray. I think some of the major distinguishing features of xray would be:

  1. Design:
    • xray is targeted at a broader audience: anyone who needs a labeled, multi-dimensional array. I would like to avoid tight coupling to any particular domain, and keep xray as a more generic analysis tool for working with labeled N-dimensional arrays.
    • xray.DataArray is not a numpy.ndarray, unlike cmds2.tvariable.TransientVariable. This makes the design cleaner, and more flexible -- we can really do whatever we want with the array behind the scenes. In contrast, subclassing numpy arrays is not very reliable or predictable (in my experience).
  2. Performance:
    • xray represents missing values by NaN (like pandas) in a numpy.ndarray, instead using a numpy.ma.MaskedArray. MaskedArray is written in pure python, so it's a far slower than using the standard ndarray.
    • xray indexes coordinate labels, and can use them for fast lookups, array assignment and alignment, all based on the pandas, with very minimal overhead.
  3. Legal:
    • CDAT is described as "public domain software with unrestricted use" on its website, but the actual license sure doesn't look like that to me. In contrast, xray has a permissive open source license.

However, I'm sure that CDAT has some useful features and designs. If there are any particular aspects that you particular appreciate and think might belong in xray, I would be very interested to hear about them.

Note: I found the source code for UV-CDAT on GitHub: https://github.com/UV-CDAT/uvcdat

@shoyer
Copy link
Member

shoyer commented Sep 2, 2014

going to close this since we now mention CDAT in our FAQ.

@shoyer shoyer closed this as completed Sep 2, 2014
@shoyer
Copy link
Member

shoyer commented Sep 27, 2014

Note: looks like I may be wrong about (most of) the legal complications -- UV-CDAT now says its available under the GPL: http://uvcdat.llnl.gov/installing.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants