This repository demonstrates how to apply Fuzzy C-Means (FCM) clustering to public transport ridership and GPS data in order to:
- Identify demand clusters where passenger boardings overlap
- Optimize stop placements based on cluster centroids
- Recommend scheduling adjustments to match temporal demand patterns
- Data Preprocessing: Clean and merge ridership CSVs with stop geometry (GeoJSON).
- Feature Engineering: Aggregate boardings by time window and assign geospatial features.
- Clustering Module: FCM implementation using
scikit-fuzzy
for soft cluster assignments. - Evaluation: Compute cluster validity indices (Silhouette Score) to assess cohesion and separation.
- Visualization: Plot clusters on maps and visualize temporal demand heatmaps.
public-transport-optimization/
├── data/
│ ├── raw/ # Original ridership CSV and stops GeoJSON
│ └── processed/ # Cleaned & merged datasets
│
├── notebooks/ # Jupyter notebooks for EDA & prototyping
│ ├── 01_data_exploration.ipynb
│ ├── 02_feature_engineering.ipynb
│ └── 03_clustering.ipynb
│
├── src/
│ ├── __init__.py
│ ├── config.py # Paths and clustering parameters
│ ├── data_preprocessing.py # load, clean, merge functions
│ ├── feature_engineering.py # time-window aggregation, geofence assignment
│ ├── clustering/
│ │ ├── fcm.py # FuzzyCMeans class with fit/predict
│ │ └── utils.py # Helper functions for matrix conversion
│ ├── evaluation.py # silhouette_score and other metrics
│ └── visualize.py # Map and heatmap plotting
│
├── requirements.txt # Python dependencies
└── README.md # Project overview and instructions
-
Clone the repository:
git clone https://github.com/yourusername/public-transport-optimization.git cd public-transport-optimization
-
Create a virtual environment (optional but recommended):
python3 -m venv .venv source .venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Preprocess data:
python src/data_preprocessing.py
-
Create features:
python src/feature_engineering.py
-
Run clustering (example in Python REPL or script):
from src.clustering.fcm import FuzzyCMeans from src.clustering.utils import df_to_matrix import pandas as pd df = pd.read_csv('data/processed/transport_features.csv') X = df_to_matrix(df) fcm = FuzzyCMeans(**FCM_PARAMS) centers, u = fcm.fit(X) labels = u.argmax(axis=0) # save or analyze labels
-
Evaluate clusters:
python src/evaluation.py
-
Visualize results:
python src/visualize.py
- Silhouette Score printed in
evaluation.py
to assess cluster quality. - Cluster maps generated by
visualize.py
show stop locations colored by cluster membership.
Contributions, issues, and feature requests are welcome! Feel free to open an issue or submit a pull request.
Developed by sk_883