This repository presents the methodology for classifying vegetation in urban areas using CatBoost, a powerful gradient boosting algorithm.
The project is part of Omdena's Local Chapter challenges: Standardized Comparison of Urban Green Space Mapping Through Remote Sensing for Frankfurt, Germany, aiming to address the challenge of detecting and mapping small urban green spaces using machine learning and deep learning techniques. ๐ณ๐๐ธ
This initiative focuses on leveraging advanced algorithms to enhance the accuracy and efficiency of urban vegetation classification. By combining high-resolution satellite imagery with innovative computational methods, the project seeks to create robust solutions for monitoring urban greenery, ultimately contributing to sustainable urban planning and environmental preservation.
The notebook catboost_Omdena-Frankfurt.ipynb includes the following steps:
-
Environment Setup โ๏ธ
- Install necessary Python packages including
numpy,scikit-learn,CatBoost,rasterio. - Setup DagsHub client for data downloading ๐.
- Install necessary Python packages including
-
Data Preparation ๐
- Download and load remote sensing data using DagsHub ๐ฅ.
- Load images and masks from the downloaded data.
- Preprocess the images and masks for the model.
-
Exploratory Data Analysis ๐
- Visualize sample images and masks to understand the dataset structure ๐.
-
Preprocessing and Training the CatBoost Model
- Extract relevant bands from the masks to create a binary vegetation mask ๐ฟ.
- Flatten the images and prepare the target labels for binary classification ๐ข.
- Handle class imbalance by calculating class weights.
- Split the data into training, validation, and test sets ๐.
- Initialize and train the CatBoost model with class weights and validation set.
-
Model Evaluation ๐
- Evaluate the trained model on the test set using various metrics including precision, recall, F1-score, and ROC-AUC score ๐ฏ.
- Visualize feature importance and ROC curve.
- Plot training and validation metrics over epochs.
The model is trained on a large dataset with optimized hyperparameters to enhance predictive accuracy. Performance is evaluated using standard classification metrics, including precision, recall, F1-score, and ROC-AUC score, ensuring a robust assessment of the model's effectiveness ๐ฏ.
- Overall Accuracy: 91% โ
- Precision for Vegetation: 98% ๐ฟ
- Precision for NON-Vegetation: 68%
- Recall for Vegetation: 91% ๐
- Recall for NON-Vegetation: 92%
- F1 score for Vegetation: 94%
- F1 score for NON-Vegetation: 78%
- ROC-AUC Score: 0.97 ๐
The confusion matrix reveals that most vegetation areas are correctly classified, though some non-vegetation pixels are misclassified as vegetation. These results highlight CatBoost's strong predictive capability and reliability for urban vegetation classification ๐.
-
Model Inference ๐ฎ
- Upload test images and evaluate the model's performance on new data ๐.
# Install necessary packages
!pip install numpy==1.25.2 scikit-learn catboost pyrsgis rasterio focal-loss segmentation_models dagshubCatBoost proves to be a highly effective and computationally efficient solution for urban vegetation mapping, achieving good performance metrics. This approach demonstrates its potential for real-world applications in remote sensing and urban green space monitoring ๐ฑ๐ฐ๏ธ.