Skip to content

Customer reviews sentiment analysis with Python and NLP. Generates a synthetic dataset of positive, neutral, and negative reviews, applies preprocessing (tokenization, stopwords, lemmatization), and builds TF-IDF features. Trains classifiers (Naive Bayes, Logistic Regression, Random Forest) with evaluation, confusion matrix and top features.

License

Notifications You must be signed in to change notification settings

AmirhosseinHonardoust/Sentiment-Analysis-NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis (NLP)

Customer review sentiment analysis with Python and NLP.
The project uses a synthetic review dataset (positive, neutral, negative), applies text preprocessing (cleaning, tokenization, stopwords removal, lemmatization), converts text to TF-IDF features, and trains classifiers (Naive Bayes, Logistic Regression, Random Forest).
The best model is selected based on macro F1-score, and results are visualized with confusion matrix, word clouds, and top TF-IDF features.


Features

  • Generate synthetic review dataset
  • Text preprocessing:
    • lowercasing, URL & punctuation removal
    • stopwords filtering
    • lemmatization
  • TF-IDF vectorization (unigrams + bigrams)
  • Models: Multinomial Naive Bayes, Logistic Regression, Random Forest
  • Evaluation: accuracy, precision, recall, F1-score
  • Visuals: confusion matrix, word clouds, top features per class
  • Saved artifacts: best model + vectorizer (joblib), metrics JSON

Project Structure

sentiment-analysis-nlp/
├─ README.md
├─ LICENSE
├─ requirements.txt
├─ data/
│  └─ generate_reviews.py
├─ src/
│  ├─ train_nlp.py
│  └─ utils.py
└─ outputs/
   └─ figures & reports (auto-created)

Setup

python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate
pip install -r requirements.txt

Generate Synthetic Reviews

python data/generate_reviews.py --n 8000 --seed 42 --out data/reviews.csv

Train & Evaluate

python src/train_nlp.py --input data/reviews.csv --outdir outputs --test-size 0.2 --seed 42

Outputs

  • metrics.json – per-model scores & best model
  • classification_report.txt
  • confusion_matrix.png
  • wordcloud_positive.png, wordcloud_negative.png
  • top_features.txt
  • best_model.joblib, vectorizer.joblib

Example Results

Confusion Matrix

Best model performance across classes:
confusion_matrix


Word Cloud (Positive Reviews)

wordcloud_positive

Word Cloud (Negative Reviews)

wordcloud_negative

Top Features

File: outputs/top_features.txt
Shows top discriminative words/phrases learned by the classifier for each class.


Data Schema

column description
review_id unique id
text raw review text
label sentiment {negative, neutral, positive}

About

Customer reviews sentiment analysis with Python and NLP. Generates a synthetic dataset of positive, neutral, and negative reviews, applies preprocessing (tokenization, stopwords, lemmatization), and builds TF-IDF features. Trains classifiers (Naive Bayes, Logistic Regression, Random Forest) with evaluation, confusion matrix and top features.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages