Skip to content

Commit 5c179c0

Browse files
committed
fix: doc
1 parent 56eb527 commit 5c179c0

File tree

1 file changed

+13
-7
lines changed

1 file changed

+13
-7
lines changed

openkaito_next.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,38 @@
11
# Bittensor Subnet 5: Text Embedding Model
22

3+
> This is a draft proposal for next version of Bittensor Subnet 5: Text Embedding Model.
4+
35
## Abstract
46

5-
Bittensor Subnet 5's primary focus is the advancement of text embedding models through collaborative efforts among miners.
7+
Bittensor Subnet 5's primary focus is the development of the world’s best performing and most generalizable text embedding model.
68

7-
Leveraging an extensive Large Language Model (LLM)-augmented corpus for evaluation, miners are empowered to develop and deploy text-embedding models that surpass current state-of-the-art (SOTA) performance.
9+
Leveraging an extensive Large Language Model (LLM)-augmented corpus for evaluation, miners are empowered to develop and deploy text-embedding models that surpass current state-of-the-art (SOTA) performance.
810

911
These models will be accessible to users via the subnet's API.
1012

1113
## Objectives & Contributions
1214

13-
The primary objective of Subnet 5 is to train and serve the best and most robust generic text-embedding models. Such text-embedding models can empower plenty of downstream applications such as semantic search, natural language understanding, and so on.
15+
The primary objective of Subnet 5 is to train and serve the best and most generalizable text-embedding models. Such text-embedding models can empower plenty of downstream applications such as semantic search, natural language understanding, and so on.
16+
1417

1518
Miners will be responsible for training models using an extensive corpus of textual data and serving the model in a low-latency and high-throughput way. These models will be utilized to generate high-quality embeddings for diverse text inputs.
1619

20+
1721
Validators will conduct rigorous evaluations of the models using multiple benchmarks. Performance comparisons will be made against existing SOTA text embedding models to ensure continuous improvement and competitiveness.
1822

19-
Subnet users will gain access to cutting-edge text embedding models that exceed SOTA performance. These models will be made publicly available through the validator API of Bittensor Subnet 5, facilitating widespread adoption and integration into various applications.
23+
24+
Subnet users will gain access to cutting-edge text embedding models that are most generic and exceed SOTA performance. These models will be made publicly available through the validator API of Bittensor Subnet 5, facilitating widespread adoption and integration into various applications.
25+
2026

2127
## Incentive Mechanism
2228

2329
Miners will receive a batch of texts and embed them.
2430

2531
For the text embeddings, validators have the pairwise relevance information to evaluate them via the contrastive learning loss:
2632

27-
$$
28-
\mathcal{L}_\text{InfoNCE} = - \mathbb{E} \Big[\log \frac{f(\mathbf{x}, \mathbf{c})}{\sum_{\mathbf{x}' \in X} f(\mathbf{x}', \mathbf{c})} \Big]
29-
$$,
33+
```math
34+
\mathcal{L}_\text{InfoNCE} = - \mathbb{E} \left[\log \frac{f(\mathbf{x}, \mathbf{c})}{\sum_{\mathbf{x}' \in X} f(\mathbf{x}', \mathbf{c})} \right]
35+
```
3036

3137
where $f(x,c) = \exp{(x \cdot c)}$ is an estimate of $\frac{p(x | c)}{p(x)}$, and $c$ is the target embedding, and $x$ is the positive sample, and $x'$ are negative samples.
3238

0 commit comments

Comments
 (0)