You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are many things that can be updated from the command line.
55
128
In short:
56
-
- All the configuration options under `trainer` are pytorch lightning trainer [api](https://pytorch-lightning.readthedocs.io/en/1.4.1/common/trainer.html#trainer-class-api).
57
-
-`models.net` are the passt options.
58
-
-`models.mel` are the preprocessing options.
129
+
- All the configuration options under `trainer` are pytorch lightning trainer [api](https://pytorch-lightning.readthedocs.io/en/1.4.1/common/trainer.html#trainer-class-api). For example, to turn off cuda benchmarking add `trainer.benchmark=False` to the command line.
130
+
- `models.net`are the PaSST (or the chosen NN) options.
131
+
- `models.mel`are the preprocessing options (mel spectrograms).
132
+
133
+
# Training on Audioset
134
+
Download and prepare the dataset as explained in the [audioset page](audioset/)
135
+
The base PaSST model can be trained for example like this:
For example using only unstructured patchout of 400:
61
141
```bash
@@ -68,6 +148,7 @@ Multi-gpu training can be enabled by setting the environment variable `DDP`, for
68
148
DDP=2 python ex_audioset.py with trainer.precision=16 models.net.arch=passt_deit_bd_p16_384 -p -m mongodb_server:27000:audioset21_balanced -c "PaSST base 2 GPU"
69
149
```
70
150
151
+
71
152
# Pre-trained models
72
153
Please check the [releases page](releases/), to download pre-trained models.
73
154
In general, you can get a pretrained model on Audioset using
@@ -79,6 +160,28 @@ model = get_model(arch="passt_s_swa_p16_128_ap476", pretrained=True, n_classes=
79
160
```
80
161
this will get automatically download pretrained PaSST on audioset with with mAP of ```0.476```. the model was trained with ```s_patchout_t=40, s_patchout_f=4``` but you can change these to better fit your task/ computational needs.
81
162
163
+
There are several pretrained models availble with different strides (overlap) and with/without using SWA: `passt_s_p16_s16_128_ap468, passt_s_swa_p16_s16_128_ap473, passt_s_swa_p16_s14_128_ap471, passt_s_p16_s14_128_ap469, passt_s_swa_p16_s12_128_ap473, passt_s_p16_s12_128_ap470`.
164
+
For example, In `passt_s_swa_p16_s16_128_ap473`: `p16`mean patch size is `16x16`, `s16` means no overlap (stride=16), 128 mel bands, `ap473` refers to the performance of this model on Audioset mAP=0.479.
165
+
166
+
In general, you can get a this pretrained model using:
Using the framework, you can evaluate this model using:
172
+
```shell
173
+
python ex_audioset.py evaluate_only with passt_s_swa_p16_s16_128_ap473
174
+
```
175
+
176
+
Two ensemble of these models are provided as well:
177
+
A large ensemble giving `mAP=.4956`
178
+
```shell
179
+
python ex_audioset.py evaluate_only with trainer.precision=16 ensemble_many
180
+
```
181
+
An ensemble of models with `stride=10` giving `mAP=.4864`
182
+
```shell
183
+
python ex_audioset.py evaluate_only with trainer.precision=16 ensemble_s10
184
+
```
82
185
83
186
# Contact
84
187
The repo will be updated, in the mean time if you have any questions or problems feel free to open an issue on GitHub, or contact the authors directly.
0 commit comments