@@ -3,147 +3,11 @@ Sparse Multidimensional Arrays
3
3
4
4
|Build Status |
5
5
6
- This implements sparse multidimensional arrays on top of NumPy and
7
- Scipy.sparse. It generalizes the scipy.sparse.coo_matrix _ layout but extends
8
- beyond just rows and columns to an arbitrary number of dimensions.
6
+ This library provides multi-dimensional sparse arrays.
9
7
10
- The original motivation is for machine learning algorithms, but it is
11
- intended for somewhat general use.
8
+ * `Documentation <https://sparse.pydata.org/en/latest >`_
9
+ * `Contributing <https://github.com/pydata/sparse/blob/master/docs/contributing.rst >`_
10
+ * `Bug Reports/Feature Requests <https://github.com/pydata/sparse/issues >`_
12
11
13
- This Supports
14
- --------------
15
-
16
- - NumPy ufuncs (where zeros are preserved)
17
- - Binary operations with other ``COO `` objects, where zeros are preserved.
18
- - Binary operations with Scipy sparse matrices, where zeros are preserved.
19
- - Binary operations with scalars, where zeros are preserved.
20
- - Broadcasting binary operations and ``broadcast_to ``.
21
- - Reductions (sum, max, min, prod, ...)
22
- - Reshape
23
- - Transpose
24
- - Tensordot
25
- - triu, tril
26
- - Slicing with integers, lists, and slices (with no step value)
27
- - Concatenation and stacking
28
-
29
- This may yet support
30
- --------------------
31
-
32
- A "does not support" list is hard to build because it is infinitely long.
33
- However the following things are in scope, relatively doable, and not yet built
34
- (help welcome).
35
-
36
- - Incremental buliding of arrays and inplace updates
37
- - More operations supported by Numpy Numpy arrays, such as ``argmin `` and ``argmax ``.
38
- - Array building functions such as ``eye ``, ``spdiags ``. See `building sparse matrices `_.
39
- - Linear algebra operations such as ``inv ``, ``norm `` and ``solve ``. See scipy.sparse.linalg _.
40
-
41
- There are no plans to support
42
- -----------------------------
43
-
44
- - Parallel computing (though Dask.array may use this in the future)
45
-
46
- Example
47
- -------
48
-
49
- ::
50
-
51
- pip install sparse
52
-
53
- .. code-block :: python
54
-
55
- import numpy as np
56
- n = 1000
57
- ndims = 4
58
- nnz = 1000000
59
- coords = np.random.randint(0 , n - 1 , size = (ndims, nnz))
60
- data = np.random.random(nnz)
61
-
62
- import sparse
63
- x = sparse.COO(coords, data, shape = ((n,) * ndims))
64
- x
65
- # <COO: shape=(1000, 1000, 1000, 1000), dtype=float64, nnz=1000000>
66
-
67
- x.nbytes
68
- # 16000000
69
-
70
- y = sparse.tensordot(x, x, axes = ((3 , 0 ), (1 , 2 )))
71
-
72
- y
73
- # <COO: shape=(1000, 1000, 1000, 1000), dtype=float64, nnz=1001588>
74
-
75
- z = y.sum(axis = (0 , 1 , 2 ))
76
- z
77
- # <COO: shape=(1000,), dtype=float64, nnz=999>
78
-
79
- z.todense()
80
- # array([ 244.0671803 , 246.38455787, 243.43383158, 256.46068737,
81
- # 261.18598416, 256.36439011, 271.74177584, 238.56059193,
82
- # ...
83
-
84
-
85
- How does this work?
86
- -------------------
87
-
88
- Scipy.sparse implements decent 2-d sparse matrix objects for the standard
89
- layouts, notably for our purposes
90
- `CSR, CSC, and COO <https://en.wikipedia.org/wiki/Sparse_matrix >`_. However it
91
- doesn't include support for sparse arrays of greater than 2 dimensions.
92
-
93
- This library extends the COO layout, which stores the row index, column index,
94
- and value of every element:
95
-
96
- === === ====
97
- row col data
98
- === === ====
99
- 0 0 10
100
- 0 2 13
101
- 1 3 9
102
- 3 8 21
103
- === === ====
104
-
105
- It is straightforward to extend the COO layout to an arbitrary number of
106
- dimensions:
107
-
108
- ==== ==== ==== === ====
109
- dim1 dim2 dim3 ... data
110
- ==== ==== ==== === ====
111
- 0 0 0 . 10
112
- 0 0 3 . 13
113
- 0 2 2 . 9
114
- 3 1 4 . 21
115
- ==== ==== ==== === ====
116
-
117
- This makes it easy to *store * a multidimensional sparse array, but we still
118
- need to reimplement all of the array operations like transpose, reshape,
119
- slicing, tensordot, reductions, etc., which can be quite challenging in
120
- general.
121
-
122
- Fortunately in many cases we can leverage the existing SciPy.sparse algorithms
123
- if we can intelligently transpose and reshape our multi-dimensional array into
124
- an appropriate 2-d sparse matrix, perform a modified sparse matrix
125
- operation, and then reshape and transpose back. These reshape and transpose
126
- operations can all be done at numpy speeds by modifying the arrays of
127
- coordinates. After scipy.sparse runs its operations (coded in C) then we can
128
- convert back to using the same path of reshapings and transpositions in
129
- reverse.
130
-
131
- This approach is not novel; it has been around in the multidimensional array
132
- community for a while. It is also how some operations in numpy work. For example
133
- the ``numpy.tensordot `` function performs transposes and reshapes so that it can
134
- use the ``numpy.dot `` function for matrix multiplication which is backed by
135
- fast BLAS implementations. The ``sparse.tensordot `` code is very slight
136
- modification of ``numpy.tensordot ``, replacing ``numpy.dot `` with
137
- ``scipy.sprarse.csr_matrix.dot ``.
138
-
139
-
140
- LICENSE
141
- -------
142
-
143
- This is licensed under New BSD-3
144
-
145
- .. _scipy.sparse.coo_matrix : https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_matrix.html
146
- .. _building sparse matrices : https://docs.scipy.org/doc/scipy/reference/sparse.html#functions
147
- .. _scipy.sparse.linalg : https://docs.scipy.org/doc/scipy/reference/sparse.linalg.html
148
12
.. |Build Status | image :: https://travis-ci.org/pydata/sparse.svg?branch=master
149
13
:target: https://travis-ci.org/pydata/sparse
0 commit comments