Advanced semantic segmentation for precision agriculture using UNet architecture with optimized training and comprehensive evaluation.
This project implements a state-of-the-art semantic segmentation pipeline for agricultural applications, focusing on pixel-wise classification of agricultural imagery to identify crops, soil, weeds, and infrastructure. While initially trained on Cityscapes data, the model is specifically designed for transfer to agricultural datasets with minimal adaptation.
- UNet Architecture: Encoder-decoder with skip connections for precise boundary detection
- Progressive Dropout: Increasing rates (0.05→0.25) in encoder, decreasing (0.2→0.05) in decoder
- Enhanced Output Layer: Additional refinement block before final classification
- Environment & Hyperparameters: CUDA-compatible with configurable batch size, learning rate, and epochs
- Reproducibility: Comprehensive seed setting for PyTorch, NumPy, and CUDA operations
- Composite Loss Function: Weighted combination of CE (50%), Focal (30%), and Dice (20%) losses
- Gradient Management: Clipping (max_norm=1.0) and accumulation (steps=2) for stability and effective larger batches
- Learning Rate Scheduling: ReduceLROnPlateau based on validation IoU
- Data Augmentation: Conservative color jittering, horizontal flipping, and normalization
- Test-Time Augmentation: Horizontal flip with prediction averaging during validation/testing
- Small Region Removal: Elimination of isolated predictions below minimum size threshold
- Confidence-Based Refinement: Boundary enhancement through prediction confidence analysis
- Quantitative Metrics: Full suite including IoU, mAP, precision, recall, F1, pixel accuracy, SSIM, PSNR, and MSE
- Qualitative Results: Visual prediction analysis with side-by-side comparisons
- Confusion Matrix: Detailed class-wise performance analysis for targeted improvements
- Python 3.7+
- PyTorch 1.7+
- torchvision 0.8+
- NumPy 1.19+
- scikit-learn 0.24+
- scikit-image 0.18+
- matplotlib 3.3+
- seaborn 0.11+
pip install -r requirements.txt
dataset/
├── leftImg8bit/ # Input images
│ ├── train/
│ ├── val/
│ └── test/
└── gtFine/ # Ground truth masks
├── train/
├── val/
└── test/
python train.py
python evaluate.py
BATCH_SIZE = 4
NUM_CLASSES = 34
LEARNING_RATE = 1e-4
EPOCHS = 77
IMAGE_SIZE = (256, 512)
DATASET_PATH = "dataset"
- Enhanced Architecture: Progressive dropout and refined output layer
- Advanced Training: Composite loss, gradient accumulation, and TTA
- Post-Processing: Small region removal for cleaner predictions
Strengths:
- Robust optimization with early stopping and adaptive learning rates
- Comprehensive evaluation framework with multiple metrics
- Flexibility for agricultural dataset adaptation
Limitations:
- Currently trained on urban scenes, pending agricultural fine-tuning
- Memory-intensive architecture for edge deployment
- Limited agricultural-specific temporal pattern handling
- Skip Connections: Preserve spatial details critical for precise crop/weed boundaries
- Multi-scale Feature Extraction: Captures individual plants, row patterns, and field layouts
- Efficient Training: Performs well even with limited agricultural training data
- Flexibility: Easily adaptable to multiple agricultural class types
- DeepLabV3+: More complex with higher parameter count
- PSPNet: Better global context but loses fine details
- Lightweight Models: Faster but significant quality reduction
- Transformer-Based: Potentially better but requires larger datasets
- Model Checkpoints: Best and final models saved automatically
- Training Curves: Loss and IoU progression visualization
- Detailed Logging: Per-epoch metrics with learning rate tracking
- Comprehensive Metrics Report: Detailed in
evaluation_results.txt
- Visual Predictions: Sample outputs with ground truth comparison
- Class Performance Breakdown: Per-class metrics for targeted improvement
- Custom agricultural dataset adaptation
- Additional agriculture-specific augmentations
- Boundary refinement for precise crop delineation
- Edge-optimized deployment for field use