Skip to content

Commit 73c9d2c

Browse files
isaac-chunggithub-actions[bot]namespace-PtzhangpeitianSamoed
authored
[MAEB] Sync with 1.38.33 (#2883)
* Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update Doubao-1.5-Embedding revision (#2613) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <[email protected]> * update logging * update lint * update link * update revision --------- Co-authored-by: zhangpeitian <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * CI: fix table (#2615) * Update tasks & benchmarks tables * Update gradio version (#2558) * Update gradio version Closes #2557 * bump gradio * fix: Removed missing dataset for MTEB(Multilingual) and bumped version We should probably just have done this earlier to ensure that the multilingual benchamrk is runable. * CI: fix infinitely committing issue (#2616) * fix token * try to trigger * add token * test ci * Update tasks & benchmarks tables * Update tasks & benchmarks tables * remove test lines --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Add ScandiSent dataset (#2620) * add scandisent dataset * add to init * typo * lint * 1.38.4 Automatically generated by python-semantic-release * Format all citations (#2614) * Fix errors in bibtex_citation * Format all bibtex_citation fields * format benchmarks * fix format * Fix tests * add formatting script * fix citations (#2628) * Add Talemaader pair classification task (#2621) Add talemaader pair classification task * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency (#2633) * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency * bump dataset revision * format bibtex * format bibtex * Remove irrelevant test (#2630) remove irrelevant test * Revert "CI: fix infinitely committing issue (#2616)" (#2636) This reverts commit 82dcb3d. * Update tasks & benchmarks tables * Remove `typer` dependency from citation script (#2629) remove typer dependency from citation script * CI format citations (#2649) * ci format citations * add files * remove from lint CI * test lint * test lint * fix names * fix: Update VisualSTS Aggregate task modalities (#2597) * Update STS17MultilingualVisualSTS.py * fix STSBenchmarkMultilingualVisualSTS --------- Co-authored-by: Isaac Chung <[email protected]> * 1.38.5 Automatically generated by python-semantic-release * Add tests for leaderboard build (#2631) * Add tests for leaderboard build * add new action * remove build tests from other actions * fix tests * correct exclusion of test * added timeout constant * fix: SIB200 machine translated > human translated (#2665) As correctly pointed out in: https://huggingface.co/datasets/mteb/sib200/discussions/1 * 1.38.6 Automatically generated by python-semantic-release * fix: Update datasets wich can't be loaded with `datasets>=3.0` (#2661) fix: Update datasets wich can't be loaded with `datasets>=3.0` (#1619) * reupload datasets * fix loader * remove commented code * lint * update pyproject dependencies * rename model RELLE to CHAIN19 (#2671) * Add relle * defined model metadata for relle * Add mteb/models/relle_models.py * Update mteb/models/relle_models.py Co-authored-by: Roman Solomatin <[email protected]> * lint after commit run after "make lint" * Add into model_modules Add model into model_modules and lint check * rename model change model name * rename model change model name --------- Co-authored-by: Roman Solomatin <[email protected]> * 1.38.7 Automatically generated by python-semantic-release * Update final version of Doubao-1.5-Embedding (Rename to Seed1.5-Embedding) (#2674) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <[email protected]> * update logging * update lint * update link * update revision * update Doubao-1.5-Embedding revision 3 * rename Doubao-1.5-Embedding to Seed1.5-Embedding --------- Co-authored-by: zhangpeitian <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * fix: Allow empty string for openai models (#2676) * fix for empty string input to openai/text-embedding-3-large * fix: Allow empty string in openai models closes: #1650 * fix based on review * Updated docstring --------- Co-authored-by: ayush1298 <[email protected]> * 1.38.8 Automatically generated by python-semantic-release * Leaderboard: UI simplifications for menus (#2672) * Leaderboard: UI simplifications for menus Did a few things to improve the simplify the leaderboard UI. Changes: - Combined FAQ entries - Created dropdowns in the select benchmark menu sidebar - Removed reference to arena - Removed reference to old leaderboard - reduced size of select menu - reduced the size of acknowledgements - removed farsi from the selection (as it is a beta) refactors: - refactored to use a class for menu items - refactored texts segments out of app.py * fixed comment * fixes for sizes * fix modality for `OVENIT2TRetrieval` (#2678) fix modality * fix: `MTEB(Code, v1)` languages (#2679) fix code languages * 1.38.9 Automatically generated by python-semantic-release * Correction in docs (#2688) * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * fix: Ensure that optional dependencies are compatible and if not state it (#2706) Fixes mistakes introduced in #2424 It seems like many of these requirements doesn't exist (voyageai>=1.0.0). @ayush1298 I am hoping you could clear up how this happened? * fix: Only install mteb into site packages (#2618) * Restrict installation directory * fix * namespace false * add star * add pont * fix import * fix import * add init files * fix setuptools find * fix image init * add missing templates --------- Co-authored-by: Roman Solomatin <[email protected]> * 1.38.10 Automatically generated by python-semantic-release * docs: Updated the PR template and improved submission docs (#2704) * docs: Updated the PR template and improved submission docs 1) Updated PR template to only include checklist for datasets and models. The other checklists were essentially just tests. 2) I have updated the documentation for adding models. Notably I have split out the implementation segment, which I think makes it more readable. 3) Required that you argue for a dataset before addition fixes #2568 * Apply suggestions from code review Co-authored-by: Isaac Chung <[email protected]> --------- Co-authored-by: Isaac Chung <[email protected]> * fix: Remove models from the leaderboard (#2705) * fix: Remove models from the leaderboard I remove both models from the leaderboard by unlinking them from the import tree. I think this is the easiest way to add a model that not currently public. * format * 1.38.11 Automatically generated by python-semantic-release * fix: Rename gemini-embedding-exp-03-07 to gemini-embedding-001 (#2711) * Rename gemini-embedding-exp-03-07 to gemini-embedding-001 * update referenfe link to the vertexAI API doc * 1.38.12 Automatically generated by python-semantic-release * fix: Integrate `lightonai/GTE-ModernColBERT-v1` (#2708) * fix: Integrate `lightonai/GTE-ModernColBERT-v1` Fixes #2673 * fixes based on corrections * 1.38.13 Automatically generated by python-semantic-release * docs: fix number of tasks for eng, v2 in docs (#2720) * fix: Added potion-multilingual-128M (#2717) * Added ModelMeta for potion-multilingual-128M * Fixed linting * Fixed linting * Updated date * 1.38.14 Automatically generated by python-semantic-release * Update the max tokens for gemini-embedding-001 (#2725) * fix: Ara and ben classification dataset cleaning (#2632) * Improve classification datasets quality for ara and ben langs * add missing AJGT * fix format * change ajgt description * Fix numbers in description, add link to pull request * Add too short filter * Link in markdown format * Update tasks & benchmarks tables * fix: Update Seed1.5-Embedding API (#2724) * update seed1.5-embedding api * update seed1.5-embedding api * update Seed1.5-Embedding API * update Seed1.5-Embedding resolve comments * update Seed1.5-Embedding lint * Update mteb/models/seed_models.py --------- Co-authored-by: Kenneth Enevoldsen <[email protected]> * 1.38.15 Automatically generated by python-semantic-release * fix: Add vidore v2 benchmarks (#2713) * adding vidore benchmarks * fix typo * clean vidore names + per lang eval * lint * vidore names * bibtex fix * fix revision * vidore v2 citation * update citation format and fix per-language mappings * lint: citations * typo citations * Update tasks & benchmarks tables * 1.38.16 Automatically generated by python-semantic-release * fix: `IndicQARetrieval` loader (#2729) * fix indic qa * add kwargs * 1.38.17 Automatically generated by python-semantic-release * fix: Promote Persian benchmark to v1 (#2707) * Switch versioning from beta to v1 and add v1 to benchmark selector * Update Farsi benchmark display name, task IDs, and metadata * Add Hakim Model * fix hakim version * update * make lint * fix: Promote Persian benchmark to v1 --------- Co-authored-by: mehran <[email protected]> Co-authored-by: Kenneth Enevoldsen <[email protected]> * Update tasks & benchmarks tables * 1.38.18 Automatically generated by python-semantic-release * Add ViDoRe combined benchmark and add to leaderboard side panel (#2732) * add ViDoRe combined benchmark and add to leaderboard side panel * Update benchmark_selector.py * Update tasks & benchmarks tables * fix: Rename display name of VDR (#2734) * Update tasks & benchmarks tables * 1.38.19 Automatically generated by python-semantic-release * fix: Add colpali models family (#2721) * add colpali models * add colpali as framework * add colpali as framework * update metadata and add colsmol * ix typos * account for revision * add training data info and lint * modify meta * correct colmodels meta and add colnomic 7b * fix typo in toml (colpali subdeps) * refine colmodel loading and metadata * 1.38.20 Automatically generated by python-semantic-release * fix: Correct embedding dimension for bge-m3 (#2738) Fixes #2735 * 1.38.21 Automatically generated by python-semantic-release * docs: Updated description of FEVER (#2745) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * minor * Backfill task metadata for metadata for BigPatentClustering and AllegroReviews (#2755) * big-patent * allegro-reviews * Update tasks & benchmarks tables * Update Seed1.5 training data (#2749) * update seed1.5 training data * update seed1.5 training data * fix: Update caltech101 (#2759) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * fix: Update Caltech101 to different source Run both versions of one of the task using `nomic-ai/nomic-embed-text-v1.5` and both scores match: ### Old ``` { "dataset_revision": "851374102055782c84f89b1b4e9d128a6568847b", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897863, ``` ### New ``` { "dataset_revision": "52439cf6d4f6ebf563d8cdc7f2c5371d9efd2686", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897929, ``` * 1.38.22 Automatically generated by python-semantic-release * Add missing PatchCamelyon_labels.txt (#2756) * ci: Delete cache in Model loading test only when model is loaded (#2761) * only delete cache when model loaded * testing it out * fix: Add `cadet-embed-base-v1` (#2727) * update * update overview.py for models * update * update * 1.38.23 Automatically generated by python-semantic-release * Fixing Google embedding task type for STS (#2767) The type `SIMILARITY` is invalid. Correct one: `SEMANTIC_SIMILARITY`. See https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types#supported_task_types * docs: Leaderboard simplifications (#2764) * docs: Leaderboard simplifications Simplified sidebar, notably: 1) Combined Language and Regional (since these are all languages) 2) Folded all (With Visual document retrieval then images start to take up a lot of space) 3) Removed legacy and instead added "Other" in language, where I moved "English Legacy" I also restructured the code so that nesting is easier. Is it also possible to create a seperate section (see dummy screenshot) * refactor to reduce nesting * format * fix: add xet support (#2603) * add xet version * add doc comment * change xet requirements * Update docs/usage/usage.md --------- Co-authored-by: Kenneth Enevoldsen <[email protected]> * 1.38.24 Automatically generated by python-semantic-release * fix: Update giga embeddings (#2774) * update giga embeddings * update giga embeddings --------- Co-authored-by: Kolodin Egor <[email protected]> * ci: add new prefixes to releases (#2766) add new prefixes * 1.38.25 Automatically generated by python-semantic-release * fix: Update Caltech101 datasets to latest revision [v1] (#2778) * fix: Update Caltech101 datasets to latest revision [v2] fixes: #2770 Fixes the issue, but only in v1 ``` # tested using: task: mteb.AbsTask = mteb.get_task("Caltech101ZeroShot") task.load_data() task.get_candidate_labels() ``` * fix rev * 1.38.26 Automatically generated by python-semantic-release * fix: CachedEmbeddingWrapper issues in both documentation and code (#2779) Fixes #2772 * 1.38.27 Automatically generated by python-semantic-release * dataset: Add miracl vision (#2736) * add miracl vision * add miracl vision * ruff * cast * image * image * add langs * add langs * add langs * add langs * descriptive stats * lint * lint * lint * remove com * Update tasks & benchmarks tables * model: Add Qwen3 Embedding model (#2769) * Init code * Remove extra config and lint code * use sentence transformer * add revisions * fix lint * Apply suggestions from code review Co-authored-by: Roman Solomatin <[email protected]> * fix lint * add framework --------- Co-authored-by: Roman Solomatin <[email protected]> * bump ruff (#2784) * Update issue and pr templates (#2782) * Update issue templates * Update bug_report.md * test yaml template * add templates * update templates * add emojis * fix typo * Apply suggestions from code review Co-authored-by: Kenneth Enevoldsen <[email protected]> * update issue titles * update PR template * remove PR templates --------- Co-authored-by: Kenneth Enevoldsen <[email protected]> * model: Add GeoGPT-Research-Project/GeoEmbedding (#2773) * add model: geogpt_models * update geogpt_models * use InstructSentenceTransformerWrapper * resolve pylint warning * format geogpt_models.py * Update mteb/models/geogpt_models.py Co-authored-by: Roman Solomatin <[email protected]> * Update mteb/models/geogpt_models.py --------- Co-authored-by: zhangzeqing <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> Co-authored-by: Kenneth Enevoldsen <[email protected]> * model: add fangxq/XYZ-embedding (#2741) * add xyz model * add xyz model * add xyz model * update * update * update * update * update * update * update * lint --------- Co-authored-by: Roman Solomatin <[email protected]> Co-authored-by: Kenneth Enevoldsen <[email protected]> * ci: fix config error for semantic release (#2800) discussed in: #2796 * dataset: Add R2MED Benchmark (#2795) * Add files via upload * Add files via upload * Update benchmarks.py * Update __init__.py * Add files via upload * Update R2MEDRetrieval.py * Update run_mteb_r2med.py * Delete scripts/run_mteb_r2med.py * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <[email protected]> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <[email protected]> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <[email protected]> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <[email protected]> * Add files via upload * Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json * Add files via upload * Add files via upload * Add files via upload * Update R2MEDRetrieval.py * Add files via upload * Add files via upload * Add files via upload * Add files via upload * format citations * Update R2MEDRetrieval.py * Add files via upload * Add files via upload --------- Co-authored-by: Li Lei <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * Update tasks & benchmarks tables * Update training datasets of GeoGPT-Research-Project/GeoEmbedding (#2802) update training datasets Co-authored-by: zhangzeqing <[email protected]> * fix: Add adapted_from to Cmedqaretrieval (#2806) * fix: Add adapted_from to Cmedqaretrieval Also snuck in a fix with form=None, which is no longer valid, but was still used in a few places. * format * 1.38.28 Automatically generated by python-semantic-release * fix: Adding client arg to init method of OpenAI models wrapper (#2803) * Adding OpenAI client arg to init method (e.g., for already initialized AzureOpenAI client) To use OpenAI embedding models via Azure, the model wrapper needs to be initialized with a different client. * Update mteb/models/openai_models.py Co-authored-by: Roman Solomatin <[email protected]> * Update mteb/models/openai_models.py * remove comment and format --------- Co-authored-by: Kenneth Enevoldsen <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * model: Add annamodels/LGAI-Embedding-Preview (#2810) Add LGAI-Embedding - Add mteb/models/lgai_embedding_models.py - defined model metadata * fix: Ensure bright uses the correct revision (#2812) fixes #2811 * 1.38.29 Automatically generated by python-semantic-release * add description to issue template (#2817) * add description to template * fix typo * model: Added 3 HIT-TMG's KaLM-embedding models (#2478) * Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper * Added KaLM_embedding_multilingual_mini_instruct_v1_5 * Added model to overview.py * Fix Task Count Per Language Table in tasks.md * resolve conflicts * remove tasks.md * Modified get_instruction funcion * Added support for prompt dict in get_instruction * fix lang code * Address comments * Delete mteb/models/check_models.py * added prompts_dict support in InstructSentenceTransformerWrapper * corrected instruction format * corrected prompts format * added correct instruction format * fix implementation * remove `if name main` * add comment --------- Co-authored-by: Roman Solomatin <[email protected]> * fix: Reuploaded previously unavailable SNL datasets (#2819) * fix: Reuploaded previously unavailable SNL datasets closes #2477 * removed exceptions from tests * temp fixes * added temporary fix * clean up commented out code * format * Update tasks & benchmarks tables * 1.38.30 Automatically generated by python-semantic-release * docs: Fix some typos in `docs/usage/usage.md` (#2835) * Update usage.md * Update usage.md * Update docs/usage/usage.md --------- Co-authored-by: Isaac Chung <[email protected]> * model: Add custom instructions for GigaEmbeddings (#2836) * add custom instructions * fixed * lint * fix last instruction --------- Co-authored-by: Kolodin Egor <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * model: add Seed-1.6-embedding model (#2841) * add Seed-1.6-embedding model * Update seed_1_6_embedding_models.py * update model meta info * support image encoder interface * error fix * fix: format seed_1_6_embedding_models.py with Ruff * fix: Update model selection for the leaderboard (#2855) * fix: Update model selection for the leaderboard fixes #2834 This removed the lower bound selection, but generally I don't think people should care about the models being too small. * fix 1M --> 1B * format * rename model_size -> max_model_size * 1.38.31 Automatically generated by python-semantic-release * fix: update training dataset info of Seed-1.6-embedding model (#2857) update seed1.6 model training data info * 1.38.32 Automatically generated by python-semantic-release * add jinav4 model meta (#2858) * add model meta * linting * fix: add check for code lora * fix: apply review comments * fix: prompt validation for tasks with `-` (#2846) * fix prompt validation * fix task name split correctly * add docstring for test * 1.38.33 Automatically generated by python-semantic-release * model: Adding Sailesh97/Hinvec (#2842) * Adding Hinvec Model's Meta data. * Adding hinvec_model.py * Update mteb/models/hinvec_models.py Co-authored-by: Kenneth Enevoldsen <[email protected]> * formated code with Black and lint with Ruff --------- Co-authored-by: Kenneth Enevoldsen <[email protected]> * Bump gradio to fix leaderboard sorting (#2866) Bump gradio * model: Adding nvidia/llama-nemoretriever-colembed models (#2861) * nvidia_llama_nemoretriever_colembed * correct 3b reference * lint fix * add training data and license for nvidia/llama_nemoretriever_colembed * lint --------- Co-authored-by: Isaac Chung <[email protected]> * rename seed-1.6-embedding to seed1.6-embedding (#2870) * fix tests to be compatible with `SentenceTransformers` `v5` (#2875) * fix sbert `v5` * add comment * model: add listconranker modelmeta (#2874) * add listconranker modelmeta * fix bugs * use linter * lint --------- Co-authored-by: Roman Solomatin <[email protected]> * model: add kalm_models ModelMeta (new PR) (#2853) * feat: add KaLM_Embedding_X_0605 in kalm_models * Update kalm_models.py for lint format --------- Co-authored-by: xinshuohu <[email protected]> * Comment kalm model (#2877) comment kalm model * Add and fix some Japanese datasets: ANLP datasets, JaCWIR, JQaRA (#2872) * Add JaCWIR and JQaRA for reranking * Fix ANLP Journal datasets * Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval * tackle test cases * Remove _evaluate_subset usage * Separate v1 and v2 * Update info for NLP Journal datasets * Update tasks & benchmarks tables * model: add Hakim and TookaSBERTV2 models (#2826) * add tooka v2s * add mcinext models * update mcinext.py * Apply PR review suggestions * Update mteb/models/mcinext_models.py --------- Co-authored-by: mehran <[email protected]> Co-authored-by: Kenneth Enevoldsen <[email protected]> --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: namespace-Pt <[email protected]> Co-authored-by: zhangpeitian <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> Co-authored-by: Kenneth Enevoldsen <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Alexey Vatolin <[email protected]> Co-authored-by: Imene Kerboua <[email protected]> Co-authored-by: Ömer Veysel Çağatan <[email protected]> Co-authored-by: Munot Ayush Sunil <[email protected]> Co-authored-by: 24September <[email protected]> Co-authored-by: wang.yuqi <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> Co-authored-by: Feiyang <[email protected]> Co-authored-by: Thomas van Dongen <[email protected]> Co-authored-by: Paul Teiletche <[email protected]> Co-authored-by: Mehran Sarmadi <[email protected]> Co-authored-by: mehran <[email protected]> Co-authored-by: Dawid Koterwas <[email protected]> Co-authored-by: Wentao Wu <[email protected]> Co-authored-by: Manveer Tamber <[email protected]> Co-authored-by: malteos <[email protected]> Co-authored-by: Egor <[email protected]> Co-authored-by: Kolodin Egor <[email protected]> Co-authored-by: Manuel Faysse <[email protected]> Co-authored-by: Xin Zhang <[email protected]> Co-authored-by: Hypothesis-Z <[email protected]> Co-authored-by: zhangzeqing <[email protected]> Co-authored-by: fangxiaoquan <[email protected]> Co-authored-by: Li Lei <[email protected]> Co-authored-by: annamodels <[email protected]> Co-authored-by: Sadra Barikbin <[email protected]> Co-authored-by: Quan Yuhan <[email protected]> Co-authored-by: Quan Yuhan <[email protected]> Co-authored-by: Mohammad Kalim Akram <[email protected]> Co-authored-by: Sailesh Panda <[email protected]> Co-authored-by: bschifferer <[email protected]> Co-authored-by: tutuDoki <[email protected]> Co-authored-by: Xinshuo Hu <[email protected]> Co-authored-by: xinshuohu <[email protected]> Co-authored-by: lsz05 <[email protected]> Co-authored-by: Kenneth Enevoldsen <[email protected]>
1 parent 1453ad6 commit 73c9d2c

24 files changed

+2739
-58
lines changed

docs/tasks.md

Lines changed: 22 additions & 8 deletions
Large diffs are not rendered by default.

mteb/leaderboard/app.py

Lines changed: 30 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@
1313
import cachetools
1414
import gradio as gr
1515
import pandas as pd
16-
from gradio_rangeslider import RangeSlider
1716

1817
import mteb
1918
from mteb.abstasks.TaskMetadata import TASK_DOMAIN, TASK_TYPE
@@ -158,10 +157,10 @@ def filter_models(
158157
availability: bool | None,
159158
compatibility: list[str],
160159
instructions: bool | None,
161-
model_size: tuple[int | None, int | None],
160+
max_model_size: int,
162161
zero_shot_setting: Literal["only_zero_shot", "allow_all", "remove_unknown"],
163162
):
164-
lower, upper = model_size
163+
lower, upper = 0, max_model_size
165164
# Setting to None, when the user doesn't specify anything
166165
if (lower == MIN_MODEL_SIZE) or (lower is None):
167166
lower = None
@@ -179,6 +178,7 @@ def filter_models(
179178
frameworks=compatibility,
180179
n_parameters_range=(lower, upper),
181180
)
181+
182182
models_to_keep = set()
183183
for model_meta in model_metas:
184184
is_model_zero_shot = model_meta.is_zero_shot_on(task_select)
@@ -217,7 +217,7 @@ def get_leaderboard_app() -> gr.Blocks:
217217
availability=None,
218218
compatibility=[],
219219
instructions=None,
220-
model_size=(MIN_MODEL_SIZE, MAX_MODEL_SIZE),
220+
max_model_size=MAX_MODEL_SIZE,
221221
zero_shot_setting="allow_all",
222222
)
223223

@@ -378,11 +378,19 @@ def get_leaderboard_app() -> gr.Blocks:
378378
label="Zero-shot",
379379
interactive=True,
380380
)
381-
model_size = RangeSlider(
382-
minimum=MIN_MODEL_SIZE,
383-
maximum=MAX_MODEL_SIZE,
384-
value=(MIN_MODEL_SIZE, MAX_MODEL_SIZE),
385-
label="Model Size (#M Parameters)",
381+
382+
max_model_size = gr.Radio(
383+
[
384+
("<100M", 100),
385+
("<500M", 500),
386+
("<1B", 1000),
387+
("<5B", 5000),
388+
("<10B", 10000),
389+
(">10B", MAX_MODEL_SIZE),
390+
],
391+
value=MAX_MODEL_SIZE,
392+
label="Model Parameters",
393+
interactive=True,
386394
)
387395

388396
with gr.Tab("Summary"):
@@ -580,15 +588,15 @@ def update_task_list(
580588
availability,
581589
compatibility,
582590
instructions,
583-
model_size,
591+
max_model_size,
584592
zero_shot: hash(
585593
(
586594
id(scores),
587595
hash(tuple(tasks)),
588596
hash(availability),
589597
hash(tuple(compatibility)),
590598
hash(instructions),
591-
hash(model_size),
599+
hash(max_model_size),
592600
hash(zero_shot),
593601
)
594602
),
@@ -599,7 +607,7 @@ def update_models(
599607
availability: bool | None,
600608
compatibility: list[str],
601609
instructions: bool | None,
602-
model_size: tuple[int, int],
610+
max_model_size: int,
603611
zero_shot: Literal["allow_all", "remove_unknown", "only_zero_shot"],
604612
):
605613
start_time = time.time()
@@ -610,7 +618,7 @@ def update_models(
610618
availability,
611619
compatibility,
612620
instructions,
613-
model_size,
621+
max_model_size,
614622
zero_shot_setting=zero_shot,
615623
)
616624
elapsed = time.time() - start_time
@@ -628,7 +636,7 @@ def update_models(
628636
availability,
629637
compatibility,
630638
instructions,
631-
model_size,
639+
max_model_size,
632640
zero_shot,
633641
],
634642
outputs=[models],
@@ -641,7 +649,7 @@ def update_models(
641649
availability,
642650
compatibility,
643651
instructions,
644-
model_size,
652+
max_model_size,
645653
zero_shot,
646654
],
647655
outputs=[models],
@@ -654,7 +662,7 @@ def update_models(
654662
availability,
655663
compatibility,
656664
instructions,
657-
model_size,
665+
max_model_size,
658666
zero_shot,
659667
],
660668
outputs=[models],
@@ -667,7 +675,7 @@ def update_models(
667675
availability,
668676
compatibility,
669677
instructions,
670-
model_size,
678+
max_model_size,
671679
zero_shot,
672680
],
673681
outputs=[models],
@@ -680,20 +688,20 @@ def update_models(
680688
availability,
681689
compatibility,
682690
instructions,
683-
model_size,
691+
max_model_size,
684692
zero_shot,
685693
],
686694
outputs=[models],
687695
)
688-
model_size.change(
696+
max_model_size.change(
689697
update_models,
690698
inputs=[
691699
scores,
692700
task_select,
693701
availability,
694702
compatibility,
695703
instructions,
696-
model_size,
704+
max_model_size,
697705
zero_shot,
698706
],
699707
outputs=[models],
@@ -706,7 +714,7 @@ def update_models(
706714
availability,
707715
compatibility,
708716
instructions,
709-
model_size,
717+
max_model_size,
710718
zero_shot,
711719
],
712720
outputs=[models],
@@ -784,7 +792,7 @@ def update_tables(
784792
availability=None,
785793
compatibility=[],
786794
instructions=None,
787-
model_size=(MIN_MODEL_SIZE, MAX_MODEL_SIZE),
795+
max_model_size=MAX_MODEL_SIZE,
788796
zero_shot="allow_all",
789797
)
790798
# We have to call this both on the filtered and unfiltered task because the callbacks

mteb/models/fa_models.py

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,3 +166,43 @@
166166
# https://huggingface.co/datasets/sbunlp/hmblogs-v3
167167
},
168168
)
169+
170+
tooka_sbert_v2_small = ModelMeta(
171+
name="PartAI/Tooka-SBERT-V2-Small",
172+
languages=["fas-Arab"],
173+
open_weights=True,
174+
revision="8bbed87e36669387f71437c061430ba56d1b496f",
175+
release_date="2025-05-01",
176+
n_parameters=122_905_344,
177+
memory_usage_mb=496,
178+
embed_dim=768,
179+
license="not specified",
180+
max_tokens=512,
181+
reference="https://huggingface.co/PartAI/Tooka-SBERT-V2-Small",
182+
similarity_fn_name="cosine",
183+
framework=["Sentence Transformers", "PyTorch"],
184+
use_instructions=False,
185+
public_training_code=None,
186+
public_training_data=None,
187+
training_datasets=None,
188+
)
189+
190+
tooka_sbert_v2_large = ModelMeta(
191+
name="PartAI/Tooka-SBERT-V2-Large",
192+
languages=["fas-Arab"],
193+
open_weights=True,
194+
revision="b59682efa961122cc0e4408296d5852870c82eae",
195+
release_date="2025-05-01",
196+
n_parameters=353_039_360,
197+
memory_usage_mb=1347,
198+
embed_dim=1024,
199+
license="not specified",
200+
max_tokens=512,
201+
reference="https://huggingface.co/PartAI/Tooka-SBERT-V2-Large",
202+
similarity_fn_name="cosine",
203+
framework=["Sentence Transformers", "PyTorch"],
204+
use_instructions=False,
205+
public_training_code=None,
206+
public_training_data=None,
207+
training_datasets=None,
208+
)

mteb/models/hinvec_models.py

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
from __future__ import annotations
2+
3+
import logging
4+
from functools import partial
5+
6+
from mteb.encoder_interface import PromptType
7+
from mteb.model_meta import ModelMeta, sentence_transformers_loader
8+
9+
logger = logging.getLogger(__name__)
10+
11+
12+
def instruction_template(
13+
instruction: str, prompt_type: PromptType | None = None
14+
) -> str:
15+
return f"Instruct: {instruction}\nQuery: " if instruction else ""
16+
17+
18+
hinvec_training_datasets = {
19+
"MintakaRetrieval": ["train"],
20+
"HindiDiscourseClassification": ["train"],
21+
"SentimentAnalysisHindi": ["train"],
22+
"MassiveScenarioClassification": ["train"],
23+
"MTOPIntentClassification": ["train"],
24+
"LinceMTBitextMining": ["train"],
25+
"PhincBitextMining": ["train"],
26+
"XNLI": ["train"],
27+
"MLQARetrieval": ["validation"],
28+
"FloresBitextMining": ["dev"],
29+
"AmazonReviewsClassification": ["train"],
30+
}
31+
32+
Hinvec_bidir = ModelMeta(
33+
loader=partial( # type: ignore
34+
sentence_transformers_loader,
35+
model_name="Sailesh97/Hinvec",
36+
revision="d4fc678720cc1b8c5d18599ce2d9a4d6090c8b6b",
37+
instruction_template=instruction_template,
38+
trust_remote_code=True,
39+
max_seq_length=2048,
40+
padding_side="left",
41+
add_eos_token=True,
42+
),
43+
name="Sailesh97/Hinvec",
44+
languages=["eng-Latn", "hin-Deva"],
45+
open_weights=True,
46+
revision="d4fc678720cc1b8c5d18599ce2d9a4d6090c8b6b",
47+
release_date="2025-06-19",
48+
n_parameters=939_591_680,
49+
memory_usage_mb=3715,
50+
embed_dim=2048,
51+
license="cc-by-nc-4.0",
52+
max_tokens=2048,
53+
reference="https://huggingface.co/Sailesh97/Hinvec",
54+
similarity_fn_name="cosine",
55+
framework=["Sentence Transformers", "PyTorch"],
56+
use_instructions=True,
57+
training_datasets=hinvec_training_datasets,
58+
public_training_code=None,
59+
public_training_data=None,
60+
)

0 commit comments

Comments
 (0)