-
Notifications
You must be signed in to change notification settings - Fork 14
Binary quantization #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 5 commits
Commits
Show all changes
61 commits
Select commit
Hold shift + click to select a range
3659904
First draft
Kerollmops 692f1ef
prepare for the new distance trait
irevoire 27ed4d4
re-implements display for all kind of nodes taking the distance into …
irevoire 3315465
everything compiles and the api is clean, need work on the tests to e…
irevoire a27a238
wip
irevoire 3634125
fix the formatter and add a few comments
irevoire 3576a88
rewrite the iterator of binary quantized slices
irevoire facc857
fix the size of the iterator
irevoire 15a13ff
fix spaces
irevoire e4847d2
remove the now useless craft_owned_unaligned_vector_from_f32 function…
irevoire ddb2304
get rid of the drain
irevoire 903ec0b
fix the spaces tests
irevoire ae65345
fix the spaces again
irevoire 9ab541e
add a comment explaining the relevancy issue we may encounter
irevoire 7907da9
rename the vector format to vector codec
irevoire 47fd2bd
fix the normalized distance for the binary quantized euclidean distance
irevoire 4c853b8
add a first test of relevancy that uses autogenerated vectors
irevoire 7f004e8
add an implementation for the binary quantized manhattan distance
irevoire b6ea560
improve lisibility of the relevancy benchmark
irevoire 0e8fba2
implements a basic oversampling
irevoire 9f66448
compute two_means on non binary quantized distances
irevoire 732767a
Make two_means return non binary quantized distances
irevoire 54626c1
add the angular distance
irevoire 6290296
implement the oversampling
irevoire d812bd8
make the binary quantized distance quick again
irevoire f9e2b63
provide a specialized method to check if a vector contains only zeros
irevoire 60875eb
fix the euclidean distance
irevoire 8c541ed
fix the tests
irevoire d75d550
[perf] use an hashmap instead of a roaring bitmap + a vec to store th…
irevoire cefadca
[perf] reduce the number of allocations by pre-allocating the size of…
irevoire c96e204
add or fix comments
irevoire 0945e33
write an simd version of the code that converts the f32 vectors to si…
irevoire d4ed3e1
fix the reminder while storing binary quantized vectors
irevoire 343dbfb
re-implement the binary quantized to f32 with SIMD
irevoire 1879821
add tests
irevoire 4d07b35
push leaf id in a vec instead of a roaring bitmap because it s quicker
irevoire 98cb3c9
fix the f32 -> binary quantized
irevoire 1c975e2
improve comments and types
irevoire f9fea3b
First version of SIMD on x86_64
Kerollmops 1592fe0
Upload the AVX version of to_vec
Kerollmops fc7a72b
Update the simd aarch64 to_vec_simd function with blend functions
Kerollmops b72958a
fix the binary quantized conversions
irevoire 0f88bfd
gate each simd function behind the right arch and provide a fallback …
irevoire 149e338
fix and improve the large binary quantized test
irevoire 7e8ee31
move the unaligned vector test to their own module
irevoire d61b7af
move the node test to the binary_quantized_tests
irevoire 283eb2c
fix the warnings
irevoire f2f6a7e
move proptest to the dev dependencies
irevoire 0b21f34
fix comment
irevoire 605b33a
fix typo$
irevoire df654bc
update a test playing with -0.0 and 0.0 since -0.0 doesn't answer the…
irevoire be0f600
fix a comment
irevoire 2d9eeb2
rename + unroll loop in from_slice_neon
irevoire 8f6eaaa
update reader::plot with .is_zero
irevoire c4aaa44
fix github lint
irevoire 9c457fe
clippy again
irevoire af598ca
improve the display implementation of the splitnode
irevoire 71f6659
making the constants in unaligned_vector::binary_quantized more expli…
irevoire 777effb
fix the cfg feature gates around the simd function
irevoire 393a5c5
remove the small relevancy benchmark in favor of the new repository
irevoire ddaedd3
fix the way we change the distance
irevoire File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
use std::borrow::Cow; | ||
|
||
use bytemuck::{Pod, Zeroable}; | ||
use rand::Rng; | ||
|
||
use super::two_means; | ||
use crate::distance::Distance; | ||
use crate::node::Leaf; | ||
use crate::parallel::ImmutableSubsetLeafs; | ||
use crate::unaligned_vector::{self, BinaryQuantized, UnalignedVector}; | ||
|
||
/// The Euclidean distance between two points in Euclidean space | ||
/// is the length of the line segment between them. | ||
/// | ||
/// `d(p, q) = sqrt((p - q)²)` | ||
#[derive(Debug, Clone)] | ||
pub enum BinaryQuantizedEuclidean {} | ||
|
||
/// The header of BinaryQuantizedEuclidean leaf nodes. | ||
#[repr(C)] | ||
#[derive(Pod, Zeroable, Debug, Clone, Copy)] | ||
pub struct NodeHeaderBinaryQuantizedEuclidean { | ||
/// An extra constant term to determine the offset of the plane | ||
bias: f32, | ||
} | ||
|
||
impl Distance for BinaryQuantizedEuclidean { | ||
type Header = NodeHeaderBinaryQuantizedEuclidean; | ||
type VectorFormat = unaligned_vector::BinaryQuantized; | ||
|
||
fn name() -> &'static str { | ||
"binary quantized euclidean" | ||
} | ||
|
||
fn new_header(_vector: &UnalignedVector<Self::VectorFormat>) -> Self::Header { | ||
NodeHeaderBinaryQuantizedEuclidean { bias: 0.0 } | ||
} | ||
|
||
fn built_distance(p: &Leaf<Self>, q: &Leaf<Self>) -> f32 { | ||
dot_product(&p.vector, &q.vector) | ||
} | ||
|
||
fn norm_no_header(v: &UnalignedVector<Self::VectorFormat>) -> f32 { | ||
dot_product(v, v).sqrt() | ||
} | ||
|
||
fn init(_node: &mut Leaf<Self>) {} | ||
|
||
fn create_split<'a, R: Rng>( | ||
children: &'a ImmutableSubsetLeafs<Self>, | ||
rng: &mut R, | ||
) -> heed::Result<Cow<'a, UnalignedVector<Self::VectorFormat>>> { | ||
let [node_p, node_q] = two_means(rng, children, false)?; | ||
let vector: Vec<f32> = | ||
node_p.vector.iter().zip(node_q.vector.iter()).map(|(p, q)| p - q).collect(); | ||
let mut normal = Leaf { | ||
header: NodeHeaderBinaryQuantizedEuclidean { bias: 0.0 }, | ||
vector: UnalignedVector::from_slice(&vector), | ||
}; | ||
Self::normalize(&mut normal); | ||
|
||
Ok(Cow::Owned(normal.vector.into_owned())) | ||
} | ||
|
||
fn margin(p: &Leaf<Self>, q: &Leaf<Self>) -> f32 { | ||
p.header.bias + dot_product(&p.vector, &q.vector) | ||
} | ||
|
||
fn margin_no_header( | ||
p: &UnalignedVector<Self::VectorFormat>, | ||
q: &UnalignedVector<Self::VectorFormat>, | ||
) -> f32 { | ||
dot_product(p, q) | ||
} | ||
} | ||
|
||
fn dot_product(u: &UnalignedVector<BinaryQuantized>, v: &UnalignedVector<BinaryQuantized>) -> f32 { | ||
u.as_bytes().iter().zip(v.as_bytes()).map(|(u, v)| (u ^ v).count_ones()).sum::<u32>() as f32 | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.