Skip to content

Commit 9b7a765

Browse files
committed
Add content to documentation homepage.
1 parent 50ddd3e commit 9b7a765

File tree

1 file changed

+33
-1
lines changed

1 file changed

+33
-1
lines changed

docs/src/index.md

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,35 @@
11
# LSH.jl
22

3-
Documentation for the LSH.jl package.
3+
LSH.jl is a Julia package for performing [locality-sensitive hashing](https://en.wikipedia.org/wiki/Locality-sensitive_hashing) with various similarity functions.
4+
5+
## Introduction
6+
One of the simplest methods for classifying, categorizing, and grouping data is to measure how similarities pairs of data points are. For instance, the classical [``k``-nearest neighbors algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) takes a similarity function
7+
8+
```math
9+
s:X\times X\to\mathbb{R}
10+
```
11+
12+
and a query point ``x\in X``, where ``X`` is the input space. It then computes ``s(x,y)`` for every point ``y`` in a database, and keeps the ``k`` points that are closest to ``x``.
13+
14+
Broadly, there are two computational issues with this approach:
15+
16+
- First, the database may be massive, much larger than could possibly fit in memory. This would make the brute-force approach of computing ``s(x,y)`` for every point ``y`` in the database far too expensive to be practical.
17+
- Second, the dimensionality of the data may be such that computing ``s(x,y)`` is itself expensive. In addition, the similarity function itself may simply be intrinsically difficult to compute. For instance, calculating Wasserstein distance entails solving a very high-dimensional linear program.
18+
19+
In order to solve these problems, researchers have over time developed a variety of techniques to accelerate similarity search:
20+
21+
- [``k``-d trees](https://en.wikipedia.org/wiki/K-d_tree)
22+
- [Ball trees](https://en.wikipedia.org/wiki/Ball_tree)
23+
- Data reduction techniques
24+
25+
## Locality-sensitive hashing
26+
*Locality-sensitive hashing* (LSH) is a technique for accelerating similarity search that works by using a hash function on the query point ``x`` and limiting similarity search to only those points in the database that experience a hash collision with ``x``. The hash functions that are used are randomly generated from a family of *locality-sensitive hash functions*. These hash functions have the property that ``Pr[h(x) = h(y)]`` (i.e., the probability of a hash collision) increases the more similar that ``x`` and ``y`` are.
27+
28+
LSH.jl is a package that provides definitions of locality-sensitive hash functions for a variety of different similarities. Currently, LSH.jl supports hash functions for
29+
30+
- Cosine similarity (`cossim`)
31+
- Jaccard similarity (`jaccard`)
32+
- ``L^1`` (Manhattan / "taxicab") distance (`ℓ1`)
33+
- ``L^2`` (Euclidean) distance (`ℓ2`)
34+
- Inner product (`inner_prod`)
35+
- Function-space hashes (`L1`, `L2`, and `cossim`)

0 commit comments

Comments
 (0)