Welcome to the TravelEase Machine Learning Documentation. TravelEase is a mobile application that provides a tourist attraction recommendation system based on user preferences. This document provides a comprehensive guide on the machine learning component of the TravelEase project.
- TravelEase - Machine Learning Documentation
TravelEase's primary features include:
- A recommendation system for tourist destinations.
- Auto-generated itineraries.
- Trip cost tracking.
- Tourist reviews.
This documentation focuses on the machine learning part of the project, which involves creating a recommendation system for tourist destinations.
Name | Student ID | Path |
---|---|---|
Haifan Tri Buwono Joyo Pangestu | M693D4KY2338 | Machine Learning |
Nisrina Diva Sulalah | M006D4KX2082 | Machine Learning |
Ariqa Bilqis | M006D4KX2081 | Machine Learning |
We use a dataset containing information about various tourist destinations.
import pandas as pd
PATH = 'https://raw.githubusercontent.com/haiffy420/TravelEase---Bangkit-2024-Capstone-Project/main/data'
tourism = pd.read_csv(f"{PATH}/tourism_with_id.csv")
We assessed and cleaned the data to ensure it's ready for analysis and modeling.
# Checking for missing values
tourism.info()
tourism.isnull().sum()
# Filling missing values
tourism['Time_Minutes'] = tourism['Time_Minutes'].fillna(60)
# Dropping unused columns
tourism = tourism.drop(columns=['Lat', 'Long', 'Unnamed: 11', 'Unnamed: 12'])
# Checking for duplicates
print("Number of data duplications (tourism): ", tourism.duplicated().sum())
We performed EDA to understand the distribution and relationships within the data.
print(tourism.describe(include='all'))
We created several visualizations to explore the data.
- Category Distribution:
import matplotlib.pyplot as plt
import seaborn as sns
category_counts = df['Category'].value_counts().reset_index()
category_counts.columns = ['Category', 'Frequency']
sns.barplot(data=category_counts, x='Category', y='Frequency', palette='viridis')
plt.title('Category Distribution')
plt.show()
- Rating Distribution:
rating_counts = df['Rating'].value_counts().reset_index()
rating_counts.columns = ['Rating', 'Frequency']
sns.barplot(data=rating_counts, x='Rating', y='Frequency', palette='viridis')
plt.title('Rating Distribution')
plt.show()
- Most Popular Tourist Destinations:
top_10 = df.sort_values(by='Rating', ascending=False).head(10)
sns.barplot(x='Rating', y='Place_Name', data=top_10, palette='viridis')
plt.title('Top 10 Tourist Destinations Based on Highest Rating')
plt.show()
We used TF-IDF to vectorize the categories of tourist destinations.
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(df['Category'])
We built and trained a neural network model to predict tourist destinations based on categories.
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from sklearn.model_selection import train_test_split
def build_model(input_dim):
inputs = Input(shape=(input_dim,))
x = Dense(128, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
outputs = Dense(input_dim, activation='softmax')(x)
model = Model(inputs, outputs)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
return model
model = build_model(tfidf_matrix.shape[1])
X_train, X_test = train_test_split(tfidf_matrix.toarray(), test_size=0.2, random_state=42)
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
if logs.get('accuracy') > 0.90 and logs.get('val_accuracy') > 0.90:
self.model.stop_training = True
model.fit(X_train, X_train, epochs=100, batch_size=32, validation_data=(X_test, X_test), callbacks=[myCallback()])
The recommendation function suggests places based on user input categories and city.
def place_recommendations(categories, city, items=df[['Category', 'Place_Name', 'Rating', 'City']], k=5):
city_items = items[items['City'] == city]
if len(items) < k:
k = len(items)
input_vector = vectorizer.transform(categories).toarray()
predicted_vector = model.predict(input_vector)
city_tfidf_matrix = vectorizer.transform(city_items['Category']).toarray()
city_similarities = cosine_similarity(predicted_vector, city_tfidf_matrix)
similar_indices = np.argsort(city_similarities, axis=1)[:, ::-1][:, :k]
recommendations = pd.DataFrame(columns=['Place_Name', 'Category', 'Rating', 'City'])
for i, indices in enumerate(similar_indices):
category_places = city_items.iloc[indices]
category_places['Category'] = categories[i]
recommendations = pd.concat([recommendations, category_places])
recommendations = recommendations.sort_values(by='Rating', ascending=False)
return recommendations
We deployed the model using FastAPI.
from fastapi import FastAPI, Query
from pydantic import BaseModel
from typing import List, Optional
from recommender import recommender
from data import fetch_all_tourism, filter_tourism
app = FastAPI()
class Item(BaseModel):
categories: list
city: str
@app.post("/")
def hello():
return {"message": "TravelEase - All-in-One Trip Companion"}
@app.post("/recommend/")
async def recommend_places(item: Item):
categories = item.categories
city = item.city
recommendations = recommender(categories, city)
return recommendations.to_dict(orient="records")
@app.get("/tourism/")
async def get_tourism_data(
name: Optional[str] = Query(None, description="Filter by place name"),
city: Optional[str] = Query(None, description="Filter by city"),
categories: Optional[List[str]] = Query(None, description="Filter by categories"),
):
print(name, city, categories)
if not any([name, city, categories]):
data = fetch_all_tourism()
else:
data = filter_tourism(name, city, categories)
return data.to_dict(orient="records")
import pandas as pd
df = pd.read_csv("data/tourism.csv")
def fetch_all_tourism():
return df
def filter_tourism(name=None, city=None, categories=None):
filtered_df = df.copy()
if name:
filtered_df = filtered_df[
filtered_df["Place_Name"].str.contains(name, case=False, na=False)
]
if city:
filtered_df = filtered_df[
filtered_df["City"].str.contains(city, case=False, na=False)
]
if categories:
# Convert categories to lowercase for case-insensitive comparison
categories = [category.lower() for category in categories]
filtered_df["Category"] = filtered_df["Category"].str.lower()
# Filter by multiple categories
category_filter = filtered_df["Category"].apply(
lambda x: any(category in x for category in categories)
)
filtered_df = filtered_df[category_filter]
return filtered_df
import pandas as pd
import tensorflow as tf
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
df = pd.read_csv("data/tourism.csv")
model = tf.keras.models.load_model("model/travelease.h5")
vectorizer = TfidfVectorizer()
vectorizer.fit(df["Category"])
def recommender(categories, city, items=df[["Place_Id", "Place_Name", "Description", "Category", "City", "Price", "Rating", "Time_Minutes", "Coordinate"]], k=5):
city_items = items[items["City"] == city]
if len(city_items) < k:
k = len(city_items)
input_vector = vectorizer.transform(categories).toarray()
predicted_vector = model.predict(input_vector)
city_tfidf_matrix = vectorizer.transform(city_items["Category"]).toarray()
city_similarities = cosine_similarity(predicted_vector, city_tfidf_matrix)
similar_indices = np.argsort(city_similarities, axis=1)[:, ::-1][:, :k]
recommendations = pd.DataFrame(columns=["Place_Id", "Place_Name", "Description", "Category", "City", "Price", "Rating", "Time_Minutes", "Coordinate"])
for i, indices in enumerate(similar_indices):
category_places = city_items.iloc[indices]
category_places["Category"] = categories[i]
recommendations = pd.concat([recommendations, category_places])
recommendations = recommendations.sort_values(by="Rating", ascending=False)
return recommendations
FROM python:3.9
RUN useradd -m -u 1000 user
WORKDIR /app
COPY --chown=user ./requirements.txt requirements.txt
RUN pip install --no-cache-dir --upgrade -r requirements.txt
COPY --chown=user . /app
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860"]
-
Clone the repository:
git clone https://github.com/haiffy420/TravelEase---Bangkit-2024-Capstone-Project.git cd TravelEase---Bangkit-2024-Capstone-Project/travelease-deployment
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the FastAPI server:
uvicorn main:app --host 0.0.0.0 --port 7860
- Clone the repository:
git clone https://github.com/haiffy420/TravelEase---Bangkit-2024-Capstone-Project.git cd TravelEase---Bangkit-2024-Capstone-Project/travelease-deployment
- Build and run the Docker container:
docker build -t travelease . docker run -p 7860:7860 travelease
- GET /: Returns a welcome message.
- GET /tourism/: Fetches tourism data with optional filters:
name: Filter by place name.
city: Filter by city.
categories: Filter by categories.
- Example Usage:
http://your_app_domain/tourism?name=taman&city=yogyakarta&categories=budaya&categories=cagar%20alam
- POST /recommend/: Returns recommended tourist destinations based on the input categories and city.
application/json
{
"categories": [
"string"
],
"city": "string"
}
Code: 200
Description: Successful Response
Media type: application/json
- Example Value
"string"
Code: 422
Description: Validation Error
Media type: application/json
- Example Value
{
"detail": [
{
"loc": [
"string",
0
],
"msg": "string",
"type": "string"
}
]
}
- Python 3.9
- Pandas
- Numpy
- TensorFlow
- Scikit-learn
- FastAPI
- Uvicorn
This project uses the dataset Indonesia Tourism Destination from Kaggle by the GetLoc Team.
- Problem: Before traveling, usually someone will make a plan in advance about the location to be visited and the time of departure. This is done to avoid problems, one of which is the distance to be traveled and the time needed does not match expectations.
- Content: This dataset contains several tourist attractions in 5 major cities in Indonesia: Jakarta, Yogyakarta, Semarang, Bandung, and Surabaya. It was used in the Capstone Project Bangkit Academy 2021 called GetLoc, an application that recommends tourist destinations based on user preferences, city, price, category, and time.