Skip to content

Identify overlapping demand zones and optimize public transport stop placement and scheduling using Fuzzy C-Means clustering on ridership and geospatial data.

Notifications You must be signed in to change notification settings

sk-883/fcm-transit-optimization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Public Transport Optimization via Fuzzy C-Means Clustering

This repository demonstrates how to apply Fuzzy C-Means (FCM) clustering to public transport ridership and GPS data in order to:

  • Identify demand clusters where passenger boardings overlap
  • Optimize stop placements based on cluster centroids
  • Recommend scheduling adjustments to match temporal demand patterns

Features

  • Data Preprocessing: Clean and merge ridership CSVs with stop geometry (GeoJSON).
  • Feature Engineering: Aggregate boardings by time window and assign geospatial features.
  • Clustering Module: FCM implementation using scikit-fuzzy for soft cluster assignments.
  • Evaluation: Compute cluster validity indices (Silhouette Score) to assess cohesion and separation.
  • Visualization: Plot clusters on maps and visualize temporal demand heatmaps.

Repository Structure

public-transport-optimization/
├── data/
│   ├── raw/                   # Original ridership CSV and stops GeoJSON
│   └── processed/             # Cleaned & merged datasets
│
├── notebooks/                 # Jupyter notebooks for EDA & prototyping
│   ├── 01_data_exploration.ipynb
│   ├── 02_feature_engineering.ipynb
│   └── 03_clustering.ipynb
│
├── src/
│   ├── __init__.py
│   ├── config.py              # Paths and clustering parameters
│   ├── data_preprocessing.py  # load, clean, merge functions
│   ├── feature_engineering.py # time-window aggregation, geofence assignment
│   ├── clustering/
│   │   ├── fcm.py             # FuzzyCMeans class with fit/predict
│   │   └── utils.py           # Helper functions for matrix conversion
│   ├── evaluation.py          # silhouette_score and other metrics
│   └── visualize.py           # Map and heatmap plotting
│
├── requirements.txt           # Python dependencies
└── README.md                  # Project overview and instructions

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/public-transport-optimization.git
    cd public-transport-optimization
  2. Create a virtual environment (optional but recommended):

    python3 -m venv .venv
    source .venv/bin/activate
  3. Install dependencies:

    pip install -r requirements.txt

Usage

  1. Preprocess data:

    python src/data_preprocessing.py
  2. Create features:

    python src/feature_engineering.py
  3. Run clustering (example in Python REPL or script):

    from src.clustering.fcm import FuzzyCMeans
    from src.clustering.utils import df_to_matrix
    import pandas as pd
    
    df = pd.read_csv('data/processed/transport_features.csv')
    X = df_to_matrix(df)
    fcm = FuzzyCMeans(**FCM_PARAMS)
    centers, u = fcm.fit(X)
    labels = u.argmax(axis=0)
    # save or analyze labels
  4. Evaluate clusters:

    python src/evaluation.py
  5. Visualize results:

    python src/visualize.py

Results

  • Silhouette Score printed in evaluation.py to assess cluster quality.
  • Cluster maps generated by visualize.py show stop locations colored by cluster membership.

Contributing

Contributions, issues, and feature requests are welcome! Feel free to open an issue or submit a pull request.

Ownership

Developed by sk_883

About

Identify overlapping demand zones and optimize public transport stop placement and scheduling using Fuzzy C-Means clustering on ridership and geospatial data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages