A collection of scripts for training and evaluating machine-learning models that classify a speaker's gender from short voice recordings. The project includes utilities for data preprocessing, model comparison, neural network training, and collecting new audio samples for inference.
voice.csv– tabular dataset containing acoustic features for labeled male and female voice samples. The last column stores the target label; all other columns are features that are scaled before training.output/voiceDetails.csv– generated dynamically by the R script when you analyse a freshly recorded voice sample. The file is preprocessed to match the format ofvoice.csvbefore inference.
- Python 3.7+
scikit-learnmatplotlibpandasnumpypyaudio(required only when recording new samples)
Install the Python dependencies from the included requirements.txt file:
pip install -r requirements.txtIf you do not plan to use the recording workflow, you may omit installing
pyaudio.
- R 3.5+
warbleR
Install the R dependency from the R console:
install.packages("warbleR")Run the comparison script to evaluate several classical models (k-NN, SVM, Decision Tree, Random Forest, MLP) on the dataset:
python clf_comparison.pyThe script prints training and testing metrics for each classifier.
Train an MLPClassifier on voice.csv and report detailed metrics:
python neural_net.pyAfter training, the model is saved as trained_neural_net and the loss curve is displayed using Matplotlib.
The main entry point exposes a small CLI menu that lets you train the neural network or record a new voice sample for inference:
python main.pyMenu options:
- Train Neural Net – runs the same training routine as
neural_net.pyand persists the model totrained_neural_net. - Analyse Voice – records a 20-second sample (requires a microphone and
pyaudio), extracts features via the R scriptgetAttributes.r, preprocesses them to match the dataset, and prints the predicted gender. - Exit – closes the menu.
The recorded audio is stored at sounds/output.wav, and the generated features are saved inside the output/ directory. Ensure that trained_neural_net exists before choosing the Analyse Voice option; otherwise, train the model first.
data_process.py– shared routines for loading, scaling, visualising, and splitting the dataset, as well as computing evaluation metrics.clf_comparison.py– trains and evaluates a suite of traditional classifiers.neural_net.py– trains, evaluates, and persists the neural network classifier.main.py– CLI interface that wires together training, recording, and inference.sound_recorder.py– handles audio capture and saving the recorded waveform.getAttributes.r– extracts acoustic features from recorded audio usingwarbleR.
- Missing audio device:
pyaudiorequires access to an input device. On headless systems, you may need to configure a virtual microphone or skip the recording workflow. - R script errors: Ensure that R is installed and available in your system
PATH, and that thewarbleRpackage is installed. The CLI invokes the script usingRscript getAttributes.rfrom the project root. - Matplotlib backend issues: If plots fail to display on headless servers, switch to a non-interactive backend by exporting
MPLBACKEND=Aggbefore running the scripts.
This repository does not include an explicit license. Please contact the original authors before using the code in production.