Skip to content

Commit 4c702b0

Browse files
committed
正确训练的权重,聚类优化
1 parent 70e6453 commit 4c702b0

13 files changed

Lines changed: 522 additions & 255 deletions

File tree

README.md

Lines changed: 39 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,40 @@ The encoder model train by total of 303 speakers for 52 hours data
1010

1111
# Introduction
1212

13-
ColorSplitter is a command-line tool designed to classify the timbre styles of single-speaker data in the early stages of vocal data processing.
13+
ColorSplitter is a command-line tool for classifying the vocal timbre styles of single-speaker data in the pre-processing stage of vocal data.
1414

15-
**Please note**, this project is based on speaker identification technology, and it is currently uncertain whether the timbre changes in singing are completely related to the differences in voiceprints, just for fun :)
15+
For scenarios that do not require style classification, using this tool to filter data can also reduce the problem of unstable timbre performance of the model.
1616

17-
The research in this field is still lacking, and this is just a start. Thanks to the community users:洛泠羽
17+
**Please note** that this project is based on Speaker Verification technology, and it is not clear whether the timbre changes of singing are completely related to the voiceprint differences, just for fun :)
18+
19+
The research in this field is still scarce, hoping to inspire more ideas.
20+
21+
Thanks to the community user: 洛泠羽
22+
23+
# New version features
24+
25+
Implemented automatic optimization of clustering results, no longer need users to judge the optimal clustering results themselves.
26+
27+
`splitter.py` deleted the `--nmax` parameter, added `--nmin` (minimum number of timbre types, invalid when cluster parameter is 2) `--cluster` (clustering method, 1:SpectralCluster, 2:UmapHdbscan), `--mer_cosine` to merge clusters that are too similar.
28+
29+
**New version tips**
30+
31+
1. Run `splitter.py` directly with the default parameters by specifying the speaker.
32+
33+
2. If the result has only one cluster, observe the distribution map, set `--nmin` to the number you think is reasonable, and run `splitter.py` again.
34+
35+
3. The optimal value of `--nmin` may be smaller than expected in actual tests.
36+
37+
4. The new clustering algorithm is faster, it is recommended to try multiple times.
38+
39+
# Progress
40+
41+
- [x] **Correctly trained weights**
42+
- [x] Clustering algorithm optimization
43+
- [ ] CAM++
44+
- [ ] ERes2Net
45+
- [ ] emotional encoder
46+
- [ ] embed mix
1847

1948
# Environment Configuration
2049

@@ -33,10 +62,10 @@ Tips:This tools running in CPU much quicker than GPU
3362
**1. Move your well-made Diffsinger dataset to the `.\input` folder and run the following command**
3463

3564
```
36-
python splitter.py --spk <speaker_name> --nmax <'N'_max_num>
65+
python splitter.py --spk <speaker_name> --nmin <'N'_min_num>
3766
```
3867

39-
Enter the speaker name after `--spk`, and enter the maximum number of timbre types after `--nmax` (minimum 2, maximum 14)
68+
Enter the speaker name after `--spk`, and enter the minimum number of timbre types after `--nmin` (minimum 1, maximum 14,default 1)
4069

4170
Tips: This project does not need to read the annotation file (transcriptions.csv) of the Diffsinger dataset, so as long as the file structure is as shown below, it can work normally
4271
```
@@ -56,19 +85,15 @@ The wav files are best already split
5685

5786
As shown, cluster 3 is obviously a minority outlier, you can use the following command to separate it from the dataset
5887
```
59-
python kick.py --spk <speaker_name> --n <n_num> --clust <clust_num>
88+
python kick.py --spk <speaker_name> --clust <clust_num>
6089
```
6190
The separated data will be saved in `.\input\<speaker_name>_<n_num>_<clust_num>`
6291

6392
Please note that running this step may not necessarily optimize the results
6493

65-
**3. Find the optimal result through the silhouette score. The higher the silhouette score, the better the result, but the optimal result may not be at the highest score, it may be on the adjacent result**
66-
67-
![scores](IMG/{6BDE2B2B-3C7A-4de5-90E8-C55DB1FC18C0}.png)
68-
69-
After you select the optimal result you think, run the following command to classify the wav files in the dataset
94+
**3. After you select the optimal result you think, run the following command to classify the wav files in the dataset
7095
```
71-
python move_files.py --spk <speaker_name> --n <n_num>
96+
python move_files.py --spk <speaker_name>
7297
```
7398
The classified results will be saved in `.\output\<speaker_name>\<clust_num>`
7499
After that, you still need to manually merge the too small clusters to meet the training requirements
@@ -80,3 +105,5 @@ After that, you still need to manually merge the too small clusters to meet the
80105
# Based on Project
81106

82107
[Resemblyzer](https://github.com/resemble-ai/Resemblyzer/)
108+
109+
[3D-Speaker](https://github.com/alibaba-damo-academy/3D-Speaker/)

README_CN.md

Lines changed: 36 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,39 @@
1010

1111
ColorSplitter是一个为了在歌声数据的处理前期,对单说话人数据的音色风格进行分类的命令行工具
1212

13-
**请注意**,本项目基于声纹识别(speaker identification)技术,目前并不确定唱歌的音色变化是与声纹差异完全相关,just for fun:)
13+
对于不需要进行风格分类的场合,使用本工具进行数据筛选,也可以减轻模型的音色表现不稳定问题
14+
15+
**请注意**,本项目基于说话人确认(Speaker Verification)技术,目前并不确定唱歌的音色变化是与声纹差异完全相关,just for fun:)
1416

1517
目前该领域研究仍然匮乏,抛砖引玉
1618

1719
感谢社区用户:洛泠羽
1820

21+
# 新版本特性
22+
23+
实装了聚类结果自动优化,不再需要用户自己判断聚类最优结果
24+
25+
`splitter.py`删除了`--nmax`参数,添加了`--nmin`(最小音色类型数量,cluster参数为2时无效)`--cluster`(聚类方式,1:SpectralCluster, 2:UmapHdbscan),`--mer_cosine`合并过于相似的簇
26+
27+
**新版本使用技巧**
28+
29+
1.默认参数直接指定说话人运行`splitter.py`
30+
31+
2.如果结果只有一个簇,观察分布图,将`--nmin`设为你认为合理的数量,再次运行`splitter.py`
32+
33+
3.实际测试下`--nmin`的最优值可能比想象的要小
34+
35+
4.新的聚类算法速度较快,建议多次尝试
36+
37+
# 进展
38+
39+
- [x] **正确训练的权重**
40+
- [x] 聚类算法优化
41+
- [ ] CAM++
42+
- [ ] ERes2Net
43+
- [ ] emotional encoder
44+
- [ ] embed mix
45+
1946
# 环境配置
2047

2148
`python3.8`下使用正常,请先安装[Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/)
@@ -32,10 +59,10 @@ pip install -r requirements.txt
3259
**1.将你制作好的Diffsinger数据集移动到`.\input`文件夹下,运行以下命令**
3360

3461
```
35-
python splitter.py --spk <speaker_name> --nmax <'N'_max_num>
62+
python splitter.py --spk <speaker_name> --nmin <'N'_min_num>
3663
```
3764

38-
其中`--spk`后输入说话人名称,`--nmax`后输入最大音色类型数量(最小2最大14
65+
其中`--spk`后输入说话人名称,`--nmin`后输入最小音色类型数量(最小1最大14默认1
3966

4067
tips:本项目并不需要读取Diffsinger数据集的标注文件(transcriptions.csv),所以保证只要文件结构如下所示就可以正常工作
4168
```
@@ -55,19 +82,15 @@ tips:本项目并不需要读取Diffsinger数据集的标注文件(transcripti
5582

5683
如同所示,簇3明显为少数离群点,可以使用以下命令将其从数据集中分离
5784
```
58-
python kick.py --spk <speaker_name> --n <n_num> --clust <clust_num>
85+
python kick.py --spk <speaker_name> --clust <clust_num>
5986
```
60-
被分离出的数据将保存在`.\input\<speaker_name>_<n_num>_<clust_num>`
87+
被分离出的数据将保存在`.\input\<speaker_name>_<clust_num>`
6188

6289
请注意运行此步骤未必会对结果产生正向优化
6390

64-
**3.通过轮廓分数寻找最优结果,轮廓分数越高则结果越好,但最优结果不一定在最高分处,可能在邻近的结果上**
65-
66-
![scores](IMG/{6BDE2B2B-3C7A-4de5-90E8-C55DB1FC18C0}.png)
67-
68-
选定你认为的最优结果后,运行以下命令将数据集中的wav文件分类
91+
**3.选定你认为的最优结果后,运行以下命令将数据集中的wav文件分类
6992
```
70-
python move_files.py --spk <speaker_name> --n <n_num>
93+
python move_files.py --spk <speaker_name>
7194
```
7295
分类后结果将保存到`.\output\<speaker_name>\<clust_num>`
7396
在那之后还需要人工对过小的簇进行归并,以达到训练的需求
@@ -77,3 +100,5 @@ python move_files.py --spk <speaker_name> --n <n_num>
77100
# 基于项目
78101

79102
[Resemblyzer](https://github.com/resemble-ai/Resemblyzer/)
103+
104+
[3D-Speaker](https://github.com/alibaba-damo-academy/3D-Speaker/)

kick.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,23 +5,21 @@
55

66
parser = argparse.ArgumentParser()
77
parser.add_argument('--spk', type=str, help='Speaker name')
8-
parser.add_argument('--n', type=str, help='N num')
98
parser.add_argument('--clust', type=int, help='Cluster value')
109

1110
args = parser.parse_args()
1211

1312
Speaker_name = args.spk #Speaker name
14-
Nnum = args.n
1513
clust_value = args.clust # Cluster value
1614

17-
data = pd.read_csv(os.path.join('output', Speaker_name, f'clustered_files_{Nnum}.csv'))
15+
data = pd.read_csv(os.path.join('output', Speaker_name, f'clustered_files.csv'))
1816

1917
for index, row in data.iterrows():
2018
file_path = row['filename']
2119
clust = row['clust']
2220

2321
if clust == clust_value:
24-
clust_dir = os.path.join('input', f'{Speaker_name}_{Nnum}_{clust_value}')
22+
clust_dir = os.path.join('input', f'{Speaker_name}_{clust_value}')
2523
if not os.path.exists(clust_dir):
2624
os.makedirs(clust_dir)
2725

modules/Resemblyzer/visualizations.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -96,8 +96,8 @@ def plot_projections(embeds, speakers, ax=None, colors=None, markers=None, legen
9696

9797
# Compute the 2D projections. You could also project to another number of dimensions (e.g.
9898
# for a 3D plot) or use a different different dimensionality reduction like PCA or TSNE.
99-
#reducer = UMAP(**kwargs)
100-
reducer = TSNE(init='pca', **kwargs)
99+
reducer = UMAP(**kwargs)
100+
#reducer = TSNE(init='pca', **kwargs)
101101
projs = reducer.fit_transform(embeds)
102102

103103
# Draw the projections
@@ -107,7 +107,7 @@ def plot_projections(embeds, speakers, ax=None, colors=None, markers=None, legen
107107
speaker_projs = projs[speakers == speaker]
108108
marker = "o" if markers is None else markers[i]
109109
label = speaker if legend else None
110-
ax.scatter(*speaker_projs.T, s=100, c=[colors[i]], marker=marker, label=label, edgecolors='k')
110+
ax.scatter(*speaker_projs.T, s=60, c=[colors[i]], marker=marker, label=label, edgecolors='k')
111111
center = speaker_projs.mean(axis=0)
112112
ax.scatter(*center, s=200, c=[colors[i]], marker="X", edgecolors='k')
113113

modules/Resemblyzer/voice_encoder.py

Lines changed: 0 additions & 183 deletions
This file was deleted.

0 commit comments

Comments
 (0)