You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+39-12Lines changed: 39 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,11 +10,40 @@ The encoder model train by total of 303 speakers for 52 hours data
10
10
11
11
# Introduction
12
12
13
-
ColorSplitter is a command-line tool designed to classify the timbre styles of single-speaker data in the early stages of vocal data processing.
13
+
ColorSplitter is a command-line tool for classifying the vocal timbre styles of single-speaker data in the pre-processing stage of vocal data.
14
14
15
-
**Please note**, this project is based on speaker identification technology, and it is currently uncertain whether the timbre changes in singing are completely related to the differences in voiceprints, just for fun :)
15
+
For scenarios that do not require style classification, using this tool to filter data can also reduce the problem of unstable timbre performance of the model.
16
16
17
-
The research in this field is still lacking, and this is just a start. Thanks to the community users:洛泠羽
17
+
**Please note** that this project is based on Speaker Verification technology, and it is not clear whether the timbre changes of singing are completely related to the voiceprint differences, just for fun :)
18
+
19
+
The research in this field is still scarce, hoping to inspire more ideas.
20
+
21
+
Thanks to the community user: 洛泠羽
22
+
23
+
# New version features
24
+
25
+
Implemented automatic optimization of clustering results, no longer need users to judge the optimal clustering results themselves.
26
+
27
+
`splitter.py` deleted the `--nmax` parameter, added `--nmin` (minimum number of timbre types, invalid when cluster parameter is 2) `--cluster` (clustering method, 1:SpectralCluster, 2:UmapHdbscan), `--mer_cosine` to merge clusters that are too similar.
28
+
29
+
**New version tips**
30
+
31
+
1. Run `splitter.py` directly with the default parameters by specifying the speaker.
32
+
33
+
2. If the result has only one cluster, observe the distribution map, set `--nmin` to the number you think is reasonable, and run `splitter.py` again.
34
+
35
+
3. The optimal value of `--nmin` may be smaller than expected in actual tests.
36
+
37
+
4. The new clustering algorithm is faster, it is recommended to try multiple times.
38
+
39
+
# Progress
40
+
41
+
-[x]**Correctly trained weights**
42
+
-[x] Clustering algorithm optimization
43
+
-[ ] CAM++
44
+
-[ ] ERes2Net
45
+
-[ ] emotional encoder
46
+
-[ ] embed mix
18
47
19
48
# Environment Configuration
20
49
@@ -33,10 +62,10 @@ Tips:This tools running in CPU much quicker than GPU
33
62
**1. Move your well-made Diffsinger dataset to the `.\input` folder and run the following command**
Enter the speaker name after `--spk`, and enter the maximum number of timbre types after `--nmax` (minimum 2, maximum 14)
68
+
Enter the speaker name after `--spk`, and enter the minimum number of timbre types after `--nmin` (minimum 1, maximum 14,default 1)
40
69
41
70
Tips: This project does not need to read the annotation file (transcriptions.csv) of the Diffsinger dataset, so as long as the file structure is as shown below, it can work normally
42
71
```
@@ -56,19 +85,15 @@ The wav files are best already split
56
85
57
86
As shown, cluster 3 is obviously a minority outlier, you can use the following command to separate it from the dataset
The separated data will be saved in `.\input\<speaker_name>_<n_num>_<clust_num>`
62
91
63
92
Please note that running this step may not necessarily optimize the results
64
93
65
-
**3. Find the optimal result through the silhouette score. The higher the silhouette score, the better the result, but the optimal result may not be at the highest score, it may be on the adjacent result**
0 commit comments