Skip to content

Latest commit

 

History

History
85 lines (62 loc) · 6.22 KB

File metadata and controls

85 lines (62 loc) · 6.22 KB
image

Business Context

"Visit with Us," a leading travel company, is revolutionizing the tourism industry by leveraging data-driven strategies to optimize operations and customer engagement. While introducing a new package offering, such as the Wellness Tourism Package, the company faces challenges in targeting the right customers efficiently. The manual approach to identifying potential customers is inconsistent, time-consuming, and prone to errors, leading to missed opportunities and suboptimal campaign performance.

To address these issues, the company aims to implement a scalable and automated system that integrates customer data, predicts potential buyers, and enhances decision-making for marketing strategies. By utilizing an MLOps pipeline, the company seeks to achieve seamless integration of data preprocessing, model development, deployment, and CI/CD practices for continuous improvement. This system will ensure efficient targeting of customers, timely updates to the predictive model, and adaptation to evolving customer behaviors, ultimately driving growth and customer satisfaction.

Objective

As an MLOps Engineer at "Visit with Us," your responsibility is to design and deploy an MLOps pipeline on GitHub to automate the end-to-end workflow for predicting customer purchases. The primary objective is to build a model that predicts whether a customer will purchase the newly introduced Wellness Tourism Package before contacting them. The pipeline will include data cleaning, preprocessing, transformation, model building, training, evaluation, and deployment, ensuring consistent performance and scalability. By leveraging GitHub Actions for CI/CD integration, the system will enable automated updates, streamline model deployment, and improve operational efficiency. This robust predictive solution will empower policymakers to make data-driven decisions, enhance marketing strategies, and effectively target potential customers, thereby driving customer acquisition and business growth.

Data Description

The dataset contains customer and interaction data that serve as key attributes for predicting the likelihood of purchasing the Wellness Tourism Package. The detailed attributes are:

Customer Details

  • CustomerID: Unique identifier for each customer.
  • ProdTaken: Target variable indicating whether the customer has purchased a package (0: No, 1: Yes).
  • Age: Age of the customer.
  • TypeofContact: The method by which the customer was contacted (Company Invited or Self Inquiry).
  • CityTier: The city category based on development, population, and living standards (Tier 1 > Tier 2 > Tier 3).
  • Occupation: Customer's occupation (e.g., Salaried, Freelancer).
  • Gender: Gender of the customer (Male, Female).
  • NumberOfPersonVisiting: Total number of people accompanying the customer on the trip.
  • PreferredPropertyStar: Preferred hotel rating by the customer.
  • MaritalStatus: Marital status of the customer (Single, Married, Divorced).
  • NumberOfTrips: Average number of trips the customer takes annually.
  • Passport: Whether the customer holds a valid passport (0: No, 1: Yes).
  • OwnCar: Whether the customer owns a car (0: No, 1: Yes).
  • NumberOfChildrenVisiting: Number of children below age 5 accompanying the customer.
  • Designation: Customer's designation in their current organization.
  • MonthlyIncome: Gross monthly income of the customer.

Customer Interaction Data

  • PitchSatisfactionScore: Score indicating the customer's satisfaction with the sales pitch.
  • ProductPitched: The type of product pitched to the customer.
  • NumberOfFollowups: Total number of follow-ups by the salesperson after the sales pitch.-
  • DurationOfPitch: Duration of the sales pitch delivered to the customer.

Designing an MLOps pipeline for "Visit with Us" requires a robust architecture that transitions from a "notebook" mindset to a production-ready system. Below is the technical blueprint for your end-to-end automated workflow using GitHub Actions and Python-based MLOps tools.

🏗️ MLOps Pipeline Architecture The system is divided into four main stages to ensure modularity and scalability.

  1. Data Engineering: Ingestion from source, cleaning (handling missing Age or MonthlyIncome), and feature engineering.

  2. Model Training: Experiment tracking and hyperparameter tuning.

  3. CI/CD Pipeline: GitHub Actions to automate testing and deployment.

  4. Deployment: Serving the model via a REST API (FastAPI) inside a Docker container.

📂 Project Structure To maintain CI/CD standards, organize your GitHub repository as follows:

Tourism-Project/ │ ├── tourism_project/ ✅ main project │ ├── data/ │ ├── tourism.csv │ ├── deployment/ │ │ ├── app.py │ │ ├── Dockerfile │ │ ├── requirements.txt │ ├── hosting/ │ ├── model_building/ ├── README.md ├── .gitignore

🛠️ Implementation Steps

  1. Data Preprocessing & TransformationGiven the data dictionary, we must handle specific challenges like categorical encoding and missing values. Imputation: Use Median for Age and MonthlyIncome to avoid outlier skewness. Encoding: * One-Hot Encoding: Occupation, Gender, MaritalStatus.Ordinal Encoding: CityTier ($Tier 1 > Tier 2 > Tier 3$), Designation.Feature Scaling: Standardize MonthlyIncome and DurationOfPitch.
  2. Model Development For predicting ProdTaken (Binary Classification), XGBoost or Random Forest are ideal due to their ability to handle non-linear relationships and mixed data types.Evaluation Metric: Since marketing costs are involved, focus on Precision (minimizing false positives/wasted calls) and F1-Score to balance targeting efficiency.
  3. CI/CD with GitHub Actions Create a .yml file in .github/workflows/main.yml to automate the lifecycle:TriggerActionPush to MainRun unit tests on preprocessing logic.Pull RequestExecute train.py and log performance metrics.MergeBuild Docker Image and push to Container Registry (e.g., GHCR).