A Book Recommendation System that suggests books to users based on their preferences, genres, and similarity with other titles.
This project applies Natural Language Processing (NLP) and Machine Learning techniques to recommend books using text data such as titles, genres, and descriptions.
- Overview
- Features
- Tech Stack
- Application Pages
- Project Structure
- Setup Instructions
- Running the Application
- Dataset Information
- Model Overview
- Future Improvements
- License
- Author
The Book Recommendation System (BRS) recommends books based on user interests and genre preferences.
It uses TF-IDF (Term Frequency–Inverse Document Frequency) and Cosine Similarity to analyze book descriptions and find the most similar titles.
The system is implemented with a Streamlit web interface that allows users to:
- Browse books by genre
- Get detailed information about selected books
- View personalized recommendations
✅ Content-based book recommendation using TF-IDF
✅ Genre-based filtering
✅ Dynamic book suggestions and details
✅ Streamlit-based interactive UI
✅ Pre-trained model storage for fast loading
✅ Modular code structure for easy maintenance
| Category | Tools Used |
|---|---|
| Language | Python 3.x |
| Framework | Streamlit |
| Libraries | pandas, numpy, scikit-learn |
| Dataset | Custom dataset (Books_10000.csv) |
| Visualization | Streamlit components |
| Model Files | .pkl files (TF-IDF, similarity matrices, final dataframe) |
Displays the project introduction, purpose, and navigation to other pages.
Shows the top 50 most famous genres in the data used for suggestions.
Provides 10 random book suggestions according to the genre you choose.
Displays the book details of the book you choose from the suggestions along with button to get more suggestions from the selected book.
Show 6 similar recommended books from the book you initially choose from your favourite genre.
.
├── .gitignore
├── .streamlit/config.toml # Streamlit app configuration
├── LandingPage.py # Main app entry point
├── downloadModel.py # Downloads model from Google Drive
├── pages/
│ ├── 1_GenrePage.py
│ ├── 2_SuggBooks.py
│ ├── 3_BookDet.py
│ └── 4_Recommendations.py
├── rawData/rawDatasetDownload.py # Script to retrieve raw data
├── recommender_Model/
│ ├── final_df.pkl # Processed book dataset
│ └── tfidf_matrices.pkl # TF-IDF vector data
└── requirements.txt # Dependencies
git clone https://github.com/Aaryan10000/Book.Recommendation.System.git
cd Book.Recommendation.Systempython3 -m venv BRSenv source BRSenv/bin/activate # Linux/macOS
BRSenv\Scripts\activate # Windows
pip install -r requirements.txt
The trained similarity matrices and TF-IDF data are hosted on Google Drive (due to GitHub’s 100MB limit). Run the following command to automatically download them: python downloadModel.py
Start the Streamlit web app:
streamlit run LandingPage.pyThis will launch the app in your default browser
This project uses the “Best Books Ever” dataset, originally published on Kaggle by mexwell.
📘 License: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) 📂 Dataset Source: Kaggle - Best Books Ever
The model uses content-based filtering with the following pipeline:
- Data Preprocessing → Clean and normalize book descriptions
- TF-IDF Vectorization → Convert text to numeric representation
- Cosine Similarity → Compute similarity between books
- Recommendation Engine → Suggest top 10 similar books All processed matrices are saved as .pkl files in recommender_Model/ for faster access.
- Add collaborative filtering for user-based recommendations
- Include book cover images using an API (e.g., Google Books API)
- Integrate user login and personalized dashboards
- Deploy using Streamlit Cloud or Render
- Improve NLP model using BERT embeddings
This project is licensed under the MIT License. You are free to use, modify, and distribute this project with attribution.
Aaryan Dawalkar 💻 B.Tech in Computer Science Engineering 📫 Linkedin




