Skip to content

Vector search similar movies with embedding generated by item2vec and CBOW model #150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Dec 14, 2023

Conversation

Daniel-Robbins
Copy link
Contributor

@Daniel-Robbins Daniel-Robbins commented Dec 12, 2023

The main purpose of this demo is to demonstrate how to train the vector representation of items using Word2vec and make item recommendations based on the similarity of item vectors. It mainly consists of 4 parts:

  1. Prepare item sequences based on user behavior.
  2. Train a CBOW model using the Word2Vec module of the gensim library.
  3. Extract all embedding data and write it to chDB.
  4. Perform queries on chDB based on cosine distance to find similar movies to the input movie.
  5. A simple unittest for vector data insertion and querying.

lmangani
lmangani previously approved these changes Dec 12, 2023
@lmangani
Copy link
Contributor

Thanks @Daniel-Robbins for all the amazing contributions 🤟

  • No checks needed for examples.

@Daniel-Robbins Daniel-Robbins marked this pull request as ready for review December 14, 2023 10:15
@auxten auxten merged commit bd7ff5a into chdb-io:main Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants