A real-time gesture-controlled media player that allows users to control system media
(play/pause, volume, next/previous track) using hand gestures captured via webcam.
This project demonstrates computer vision, real-time processing, gesture recognition, and OS-level automation using Python.
- β Open Palm β Play / Pause
- π Thumb Up β Volume Up
- π Thumb Down β Volume Down
- π Swipe Right β Next Track
- π Swipe Left β Previous Track
- π§ Gesture stability using multi-frame confirmation
- β‘ Real-time performance with FPS counter
- π₯οΈ OS-level media control (works with Spotify, YouTube, VLC, etc.)
Webcam Feed -> OpenCV Frame Processing -> MediaPipe Tasks API (21 Hand Landmarks) -> Finger State Detection -> Static & Dynamic Gesture Classification -> Gesture Stabilization (N-frame window) -> Media Control via PyAutoGUI
gesture-controlled-media-player/ βββ models/ β βββ hand_landmarker.task # MediaPipe hand landmark model β βββ src/ β βββ camera.py # Webcam handling (OpenCV) β βββ hand_tracker.py # Hand landmark detection (MediaPipe) β βββ gesture_utils.py # Gesture logic, swipe detection, stability β βββ media_controller.py # OS-level media automation β βββ main.py # Application entry point β βββ requirements.txt # Project dependencies βββ README.md # Project documentation βββ venv/ # Virtual environment (ignored in Git)
git clone cd gesture-controlled-media-player
2οΈβ£ Create & Activate Virtual Environment
python -m venv venv venv\Scripts\activate # Windows
3οΈβ£ Install Dependencies
pip install -r requirements.txt
4οΈβ£ Run Application
python src/main.py π§ͺ Supported Gestures Gesture Action Open Palm Play / Pause Thumb Up Volume Up Thumb Down Volume Down Swipe Right Next Track Swipe Left Previous Track
π Performance & Stability Real-time hand tracking using MediaPipe Tasks API
FPS counter for performance monitoring
Gesture stabilization using sliding window (N-frame confirmation)
Cooldown & debounce logic to prevent accidental triggers
π§ Technical Highlights (Resume Keywords) Computer Vision
Real-Time Video Processing
Hand Landmark Detection
Gesture Recognition
MediaPipe Tasks API
OpenCV
OS Automation
Performance Optimization
Modular Python Design
π Future Enhancements Gesture-controlled virtual mouse
Smart home device integration
Custom gesture training using ML models
Cross-platform support (Linux/macOS)
π Aspiring Software / Full Stack / Computer Vision Developer