Skip to content

Commit 678de28

Browse files
mrocklinhameerabbasi
authored andcommitted
Merge README content into the main index doc page (#97)
* Merge README content into the main index doc page This adds some informative content to the documentation and centralizes our prose in one place. * Range of changes to docs. * Fix all broken links in the docs. * Add useful links to README.
1 parent 05fed28 commit 678de28

File tree

7 files changed

+118
-190
lines changed

7 files changed

+118
-190
lines changed

README.rst

Lines changed: 4 additions & 140 deletions
Original file line numberDiff line numberDiff line change
@@ -3,147 +3,11 @@ Sparse Multidimensional Arrays
33

44
|Build Status|
55

6-
This implements sparse multidimensional arrays on top of NumPy and
7-
Scipy.sparse. It generalizes the scipy.sparse.coo_matrix_ layout but extends
8-
beyond just rows and columns to an arbitrary number of dimensions.
6+
This library provides multi-dimensional sparse arrays.
97

10-
The original motivation is for machine learning algorithms, but it is
11-
intended for somewhat general use.
8+
* `Documentation <https://sparse.pydata.org/en/latest>`_
9+
* `Contributing <https://github.com/pydata/sparse/blob/master/docs/contributing.rst>`_
10+
* `Bug Reports/Feature Requests <https://github.com/pydata/sparse/issues>`_
1211

13-
This Supports
14-
--------------
15-
16-
- NumPy ufuncs (where zeros are preserved)
17-
- Binary operations with other ``COO`` objects, where zeros are preserved.
18-
- Binary operations with Scipy sparse matrices, where zeros are preserved.
19-
- Binary operations with scalars, where zeros are preserved.
20-
- Broadcasting binary operations and ``broadcast_to``.
21-
- Reductions (sum, max, min, prod, ...)
22-
- Reshape
23-
- Transpose
24-
- Tensordot
25-
- triu, tril
26-
- Slicing with integers, lists, and slices (with no step value)
27-
- Concatenation and stacking
28-
29-
This may yet support
30-
--------------------
31-
32-
A "does not support" list is hard to build because it is infinitely long.
33-
However the following things are in scope, relatively doable, and not yet built
34-
(help welcome).
35-
36-
- Incremental buliding of arrays and inplace updates
37-
- More operations supported by Numpy Numpy arrays, such as ``argmin`` and ``argmax``.
38-
- Array building functions such as ``eye``, ``spdiags``. See `building sparse matrices`_.
39-
- Linear algebra operations such as ``inv``, ``norm`` and ``solve``. See scipy.sparse.linalg_.
40-
41-
There are no plans to support
42-
-----------------------------
43-
44-
- Parallel computing (though Dask.array may use this in the future)
45-
46-
Example
47-
-------
48-
49-
::
50-
51-
pip install sparse
52-
53-
.. code-block:: python
54-
55-
import numpy as np
56-
n = 1000
57-
ndims = 4
58-
nnz = 1000000
59-
coords = np.random.randint(0, n - 1, size=(ndims, nnz))
60-
data = np.random.random(nnz)
61-
62-
import sparse
63-
x = sparse.COO(coords, data, shape=((n,) * ndims))
64-
x
65-
# <COO: shape=(1000, 1000, 1000, 1000), dtype=float64, nnz=1000000>
66-
67-
x.nbytes
68-
# 16000000
69-
70-
y = sparse.tensordot(x, x, axes=((3, 0), (1, 2)))
71-
72-
y
73-
# <COO: shape=(1000, 1000, 1000, 1000), dtype=float64, nnz=1001588>
74-
75-
z = y.sum(axis=(0, 1, 2))
76-
z
77-
# <COO: shape=(1000,), dtype=float64, nnz=999>
78-
79-
z.todense()
80-
# array([ 244.0671803 , 246.38455787, 243.43383158, 256.46068737,
81-
# 261.18598416, 256.36439011, 271.74177584, 238.56059193,
82-
# ...
83-
84-
85-
How does this work?
86-
-------------------
87-
88-
Scipy.sparse implements decent 2-d sparse matrix objects for the standard
89-
layouts, notably for our purposes
90-
`CSR, CSC, and COO <https://en.wikipedia.org/wiki/Sparse_matrix>`_. However it
91-
doesn't include support for sparse arrays of greater than 2 dimensions.
92-
93-
This library extends the COO layout, which stores the row index, column index,
94-
and value of every element:
95-
96-
=== === ====
97-
row col data
98-
=== === ====
99-
0 0 10
100-
0 2 13
101-
1 3 9
102-
3 8 21
103-
=== === ====
104-
105-
It is straightforward to extend the COO layout to an arbitrary number of
106-
dimensions:
107-
108-
==== ==== ==== === ====
109-
dim1 dim2 dim3 ... data
110-
==== ==== ==== === ====
111-
0 0 0 . 10
112-
0 0 3 . 13
113-
0 2 2 . 9
114-
3 1 4 . 21
115-
==== ==== ==== === ====
116-
117-
This makes it easy to *store* a multidimensional sparse array, but we still
118-
need to reimplement all of the array operations like transpose, reshape,
119-
slicing, tensordot, reductions, etc., which can be quite challenging in
120-
general.
121-
122-
Fortunately in many cases we can leverage the existing SciPy.sparse algorithms
123-
if we can intelligently transpose and reshape our multi-dimensional array into
124-
an appropriate 2-d sparse matrix, perform a modified sparse matrix
125-
operation, and then reshape and transpose back. These reshape and transpose
126-
operations can all be done at numpy speeds by modifying the arrays of
127-
coordinates. After scipy.sparse runs its operations (coded in C) then we can
128-
convert back to using the same path of reshapings and transpositions in
129-
reverse.
130-
131-
This approach is not novel; it has been around in the multidimensional array
132-
community for a while. It is also how some operations in numpy work. For example
133-
the ``numpy.tensordot`` function performs transposes and reshapes so that it can
134-
use the ``numpy.dot`` function for matrix multiplication which is backed by
135-
fast BLAS implementations. The ``sparse.tensordot`` code is very slight
136-
modification of ``numpy.tensordot``, replacing ``numpy.dot`` with
137-
``scipy.sprarse.csr_matrix.dot``.
138-
139-
140-
LICENSE
141-
-------
142-
143-
This is licensed under New BSD-3
144-
145-
.. _scipy.sparse.coo_matrix: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_matrix.html
146-
.. _building sparse matrices: https://docs.scipy.org/doc/scipy/reference/sparse.html#functions
147-
.. _scipy.sparse.linalg: https://docs.scipy.org/doc/scipy/reference/sparse.linalg.html
14812
.. |Build Status| image:: https://travis-ci.org/pydata/sparse.svg?branch=master
14913
:target: https://travis-ci.org/pydata/sparse

docs/changelog.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Changelog
1515
- Fix nnz for scalars (:pr:`48`) `Hameer Abbasi`_
1616
- Update README (:pr:`50`) (:pr:`53`) `Hameer Abbasi`_
1717
- Fix large concatenations and stacks (:pr:`50`) `Hameer Abbasi`_
18-
- Add __array_ufunc__ for __call__ and reduce (:pr:`r9`) `Hameer Abbasi`_
18+
- Add __array_ufunc__ for __call__ and reduce (:pr:`49`) `Hameer Abbasi`_
1919
- Update documentation (:pr:`54`) `Hameer Abbasi`_
2020
- Flake8 and coverage in pytest (:pr:`59`) `Nils Werner`_
2121
- Copy constructor (:pr:`55`) `Nils Werner`_

docs/construct.rst

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ You can construct :obj:`COO` arrays from coordinates and value data.
99

1010
The :code:`coords` parameter contains the indices where the data is nonzero,
1111
and the :code:`data` parameter contains the data corresponding to those indices.
12-
For example, the following code will generate a :math:`5 \times 5` identity
12+
For example, the following code will generate a :math:`5 \times 5` diagonal
1313
matrix:
1414

1515
.. code-block:: python
@@ -53,9 +53,9 @@ explicitly. For example, if we did the following without the
5353
data = [1, 4, 2, 1]
5454
s = COO(coords, data, shape=(5, 5))
5555
56-
From :obj:`scipy.sparse.spmatrix` objects
57-
-----------------------------------------
58-
To construct :obj:`COO` array from :obj:`scipy.sparse.spmatrix`
56+
From :doc:`Scipy sparse matrices <generated/scipy.sparse.spmatrix>`
57+
-------------------------------------------------------------------
58+
To construct :obj:`COO` array from :obj:`spmatrix <scipy.sparse.spmatrix>`
5959
objects, you can use the :obj:`COO.from_scipy_sparse` method. As an
6060
example, if :code:`x` is a :obj:`scipy.sparse.spmatrix`, you can
6161
do the following to get an equivalent :obj:`COO` array:
@@ -64,8 +64,8 @@ do the following to get an equivalent :obj:`COO` array:
6464
6565
s = COO.from_scipy_sparse(x)
6666
67-
From :obj:`numpy.ndarray` objects
68-
---------------------------------
67+
From :doc:`Numpy arrays <reference/generated/numpy.ndarray>`
68+
------------------------------------------------------------
6969
To construct :obj:`COO` arrays from :obj:`numpy.ndarray`
7070
objects, you can use the :obj:`COO.from_numpy` method. As an
7171
example, if :code:`x` is a :obj:`numpy.ndarray`, you can
@@ -100,7 +100,7 @@ dictionary or is set to :code:`dtype('float64')` if that is not present.
100100
.. code-block:: python
101101
102102
s = DOK((6, 5, 2))
103-
s2 = DOK((2, 3, 4), dtype=np.float64)
103+
s2 = DOK((2, 3, 4), dtype=np.uint8)
104104
105105
After this, you can build the array by assigning arrays or scalars to elements
106106
or slices of the original array. Broadcasting rules are followed.
@@ -114,7 +114,7 @@ perform arithmetic or other operations on it.
114114

115115
.. code-block:: python
116116
117-
s2 = COO(s)
117+
s3 = COO(s)
118118
119119
In addition, it is possible to access single elements of the :obj:`DOK` array
120120
using normal Numpy indexing.
@@ -128,8 +128,8 @@ using normal Numpy indexing.
128128

129129
Converting :obj:`COO` objects to other Formats
130130
----------------------------------------------
131-
:obj:`COO` arrays can be converted to :obj:`numpy.ndarray` objects,
132-
or to some :obj:`scipy.sparse.spmatrix` subclasses via the following
131+
:obj:`COO` arrays can be converted to :doc:`Numpy arrays <reference/generated/numpy.ndarray>`,
132+
or to some :obj:`spmatrix <scipy.sparse.spmatrix>` subclasses via the following
133133
methods:
134134

135135
* :obj:`COO.todense`: Converts to a :obj:`numpy.ndarray` unconditionally.

docs/contribute.rst renamed to docs/contributing.rst

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,18 @@ If you're not already familiar with it, we follow the `fork and pull model
1717
<https://help.github.com/articles/about-collaborative-development-models/>`_
1818
on GitHub.
1919

20+
Filing Issues
21+
-------------
22+
If you find a bug or would like a new feature, you might want to `consider
23+
filing a new issue on GitHub <https://github.com/pydata/sparse/issues>`_. Before
24+
you open a new issue, please make sure of the following:
25+
26+
* This should go without saying, but make sure what you are requesting is within
27+
the scope of this project.
28+
* The bug/feature is still present/missing on the ``master`` branch on GitHub.
29+
* A similar issue or pull request isn't already open. If one already is, it's better
30+
to contribute to the discussion there.
31+
2032
Running/Adding Unit Tests
2133
-------------------------
2234
It is best if all new functionality and/or bug fixes have unit tests added
@@ -25,9 +37,9 @@ with each use-case.
2537
Since we support both Python 2.7 and Python 3.5 and newer, it is recommended
2638
to test with at least these two versions before committing your code or opening
2739
a pull request. We use `pytest <https://docs.pytest.org/en/latest/>`_ as our unit
28-
testing framework, with the pytest-cov extension to check code coverage and
29-
pytest-flake8 to check code style. You don't need to configure these extensions
30-
yourself. Once you've configured your environment, you can just :code:`cd` to
40+
testing framework, with the ``pytest-cov`` extension to check code coverage and
41+
``pytest-flake8`` to check code style. You don't need to configure these extensions
42+
yourself. Once you've configured your environment, you can just ``cd`` to
3143
the root of your repository and run
3244

3345
.. code-block:: bash

docs/index.rst

Lines changed: 72 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,78 @@
11
Sparse
22
======
33

4-
Introduction
5-
------------
6-
In many scientific applications, arrays come up that are mostly empty or filled
7-
with zeros. These arrays are aptly named *sparse arrays*. However, it is a matter
8-
of choice as to how these are stored. One may store the full array, i.e., with all
9-
the zeros included. This incurs a significant cost in terms of memory and
10-
performance when working with these arrays.
11-
12-
An alternative way is to store them in a standalone data structure that keeps track
13-
of only the nonzero entries. Often, this improves performance and memory consumption
14-
but most operations on sparse arrays have to be re-written. :obj:`sparse` tries to
15-
provide one such data structure. It isn't the only library that does this. Notably,
16-
:obj:`scipy.sparse` achieves this, along with `pysparse <http://pysparse.sourceforge.net/>`_.
4+
This implements sparse arrays of arbitrary dimension on top of :obj:`numpy` and :obj:`scipy.sparse`.
5+
It generalizes the :obj:`scipy.sparse.coo_matrix` and :obj:`scipy.sparse.dok_matrix` layouts,
6+
but extends beyond just rows and columns to an arbitrary number of dimensions.
7+
8+
Additionally, this project maintains compatibility with the :obj:`numpy.ndarray` interface
9+
rather than the :obj:`numpy.matrix` interface used in :obj:`scipy.sparse`
10+
11+
These differences make this project useful in certain situations
12+
where scipy.sparse matrices are not well suited,
13+
but it should not be considered a full replacement.
14+
It lacks layouts that are not easily generalized like CSR/CSC
15+
and depends on scipy.sparse for some computations.
16+
1717

1818
Motivation
1919
----------
20-
So why use :obj:`sparse`? Well, the other libraries mentioned are mostly limited to
21-
two-dimensional arrays. In addition, inter-compatibility with :obj:`numpy` is
22-
hit-or-miss. :obj:`sparse` strives to achieve inter-compatibility with
23-
:obj:`numpy.ndarray`, and provide mostly the same API. It defers to :obj:`scipy.sparse`
24-
when it is convenient to do so, and writes custom implementations of operations where
25-
this isn't possible. It also supports general N-dimensional arrays.
20+
21+
Sparse arrays, or arrays that are mostly empty or filled with zeros,
22+
are common in many scientific applications.
23+
To save space we often avoid storing these arrays in traditional dense formats,
24+
and instead choose different data structures.
25+
Our choice of data structure can significantly affect our storage and computational
26+
costs when working with these arrays.
27+
28+
29+
Design
30+
------
31+
32+
The main data structure in this library follows the
33+
`Coordinate List (COO) <https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)>`_
34+
layout for sparse matrices, but extends it to multiple dimensions.
35+
36+
The COO layout, which stores the row index, column index, and value of every element:
37+
38+
=== === ====
39+
row col data
40+
=== === ====
41+
0 0 10
42+
0 2 13
43+
1 3 9
44+
3 8 21
45+
=== === ====
46+
47+
It is straightforward to extend the COO layout to an arbitrary number of
48+
dimensions:
49+
50+
==== ==== ==== === ====
51+
dim1 dim2 dim3 ... data
52+
==== ==== ==== === ====
53+
0 0 0 . 10
54+
0 0 3 . 13
55+
0 2 2 . 9
56+
3 1 4 . 21
57+
==== ==== ==== === ====
58+
59+
This makes it easy to *store* a multidimensional sparse array, but we still
60+
need to reimplement all of the array operations like transpose, reshape,
61+
slicing, tensordot, reductions, etc., which can be challenging in general.
62+
63+
Fortunately in many cases we can leverage the existing :obj:`scipy.sparse`
64+
algorithms if we can intelligently transpose and reshape our multi-dimensional
65+
array into an appropriate 2-d sparse matrix, perform a modified sparse matrix
66+
operation, and then reshape and transpose back. These reshape and transpose
67+
operations can all be done at numpy speeds by modifying the arrays of
68+
coordinates. After scipy.sparse runs its operations (often written in C) then
69+
we can convert back to using the same path of reshapings and transpositions in
70+
reverse.
71+
72+
LICENSE
73+
-------
74+
75+
This library is licensed under BSD-3
2676

2777
.. toctree::
2878
:maxdepth: 3
@@ -33,5 +83,7 @@ this isn't possible. It also supports general N-dimensional arrays.
3383
construct
3484
operations
3585
generated/sparse
36-
contribute
86+
contributing
3787
changelog
88+
89+
.. _scipy.sparse: https://docs.scipy.org/doc/scipy/reference/sparse.html

0 commit comments

Comments
 (0)