| Source | Collecting Method |
|---|---|
| Dataset in Zenodo ontology.json |
Directly Download |
You may refer to preprocess_FSD50K.py for all the details. Here we just offer a concise summary:
Some audio-json pairs selected from the processed dataset:
3303.mov
{
"text": [
"The sounds of Speech, Human voice and Babbling"
],
"original_data": {
"title": "dadadaduhahhhh.wav",
"description": "9 month old baby boy making various baby noises, vocalizations, and actual baby behavior performances. More stories about a heroic man who fights against the capitalist hordes everyday to bring home pureed bananas.",
"license": "http://creativecommons.org/licenses/by/3.0/",
"uploader": "NoiseCollector",
"fname": "113237",
"mids(class_label_id)": ["/m/09x0r","/m/09l8g","/m/0261r1"]
},
"tag": [
"Vocal","Human voice","Foley","Human","Free","Gibberish","Speech","Baby","Recording","Noisecollector","Child","Babbling","Boy"
]
}9952.mov
{
"text": [
"The sounds of Musical instrument, Wind instrument, woodwind instrument, Flute and Music"
],
"original_data": {
"title": "Flute - A4 - bad-dynamics",
"description": "Recorded in the context of the good-sounds.org project from the Music Technology Group, Universitat Pompeu Fabra, Barcelona.\nPart of the Good-sounds dataset of monophonic instrumental sounds.\n\ninstrument::flute\nnote::A\noctave::4\nmidi note::57\nmicrophone::neumann U87\ntuning reference::442\ngood-sounds-id::184\n\n\nIntentionally played as an example of bad-dynamics",
"license": "http://creativecommons.org/licenses/by/3.0/",
"uploader": "MTG",
"fname": "354546",
"mids(class_label_id)": ["/m/04szw","/m/085jw","/m/0l14j_","/m/04rlf"]
},
"tag": [
"woodwind instrument","Music","Good-sounds","Wind instrument","Single-note","Musical instrument","Flute","Neumann-u87","Multisample","A4"
]
}Retrieve all corresponding class labels’ ids from two files:
- FSD50K.ground_truth/{type}.csv
- FSD50K.metadata/collection/collection_{type}.csv
Then read the .json file ontology.json mentioned in the section “Downloading” to get the real class labels associated with ids retrieved. The next step is to determine the contents of .json file:
-
text entrySince FSD50K does not contain any human-written captions, we must make up captions using this template: if we have class labels A, B and C, then we let the caption be"The sounds of A, B and C".
-
tag entryAll the class labels retrieved are stored here. -
original_data entryWe put here all extra information retrieved from these 4 files:- dev/eval.csv
- collection_dev/eval.csv
- ontology.json
- dev/eval_clips_info_FSD50K.json
Discard all audios failed to be read by soundfile.read() method or denied by FFmpeg while processing.
After the preprocessing work, all audio files should be in FLAC format with sampling rate of 48KHZ. (Processed by ffmpeg).