TRACKING: Review and possibly address changes regard new dtypes

Since more such changes will inadvertently be added, there are a few changes that will happen with my array-coercion branch at this time, and just as much with other things later.  These are things that we should review before 1.20, and decide on the exact course of action.

1. `np.array([np.float64(np.inf)], dtype=np.int64)` as opposed to `np.array(np.float64(np.inf), dtype=np.int64)` used to use `float(val)` to assign the item. Which meant that bad float values (including out of bounds) would lead to errors.  The new code will use normal casting logic in all cases, which uses C-casting and typically ends up at the minimum integer. *The ideal solution will probably be to add warnings/errors also for casting in general, although we still need to decide how to do that best.*

2. The truthiness (and casting) of strings to bools is badly defined right now, see also gh-9875. The array coercion changes will have aligned some of these, but not necessarily for the better.  We should ensure that the new behaviour is not worse than the old, and generally push forward with fixing the situation. (The question is how slow we have to take it?)

3. The dtype discovery (mainly with respect to string length) is now improved.  This means that object arrays being coerced are always inspected correctly.  At the same time, the string length is now consistent, but always the normal casting version for numpy scalars (e.g. a float64).  This only affects numpy scalars and only array coercion.  *There is probably no need to do anything here, the behaviour was never consistent and the new behaviour errs on the safe side and is generally better*.

4. Not an issue as such (low priority):  The coercion cache is also used in some cases where the array is not coerced immediately.  This wastes some memory, and the code could be improved to allow doing no caching.

5. We should possibly anticipate (my preferred solution is an error), dtypes which are not conserved during array object creation.  These should only be dtypes with subarrays (and no structure). That should be pretty much impossible, but... See also gh-15471.

6. Consider simply making all DType classes HeapTypes. This would also allow to give them a repr, which may be copy-pastable in the future (if we stick with the square bracket notation). since it allows a period in the repr to make it: `np.dtype[np.float64]`.

7. Consider implementing `dtype.type` to look up the class attribute `DType.type` instead.  Right now these must always match, but we must retain the C-side `dtype->scalar_type` slot for backward compatibility.

8.  The Maping `pytype -> DType` is currently strong. This is an issue in theory for support of dynamically created `pytype <-> DType` pairs.  The solution to this is to a have weak mapping but set a `pytype.__associated_numpy_dtype__` slot when registering, which should be unproblematic for HeapTypes.  For static types everything has to be immortal, but we can allow that path for NumPy itself.  Anyone wanting to add a DType for another pytype (*with* automatic mapping/dtype discovery), has to make sure that NumPy can fill that attribute on the `pytype` (which is possible also for static types of course).


EDIT: The more important issues have been cleared out. The boolean one is still there, but will only affect numpy strings, and I think that is fine. (Of course we still need to clean up strings, but its so strange...)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TRACKING: Review and possibly address changes regard new dtypes #16624

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

TRACKING: Review and possibly address changes regard new dtypes #16624

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions