Skip to content

Releases: embeddings-benchmark/mteb

1.38.34

10 Jul 12:29
Compare
Choose a tag to compare

1.38.34 (2025-07-10)

Fix

  • fix: pin datasets version (#2892)

fix datasets version (00c95cf)

Unknown

  • Update tasks & benchmarks tables (5303fec)

  • dataset: Evalita dataset integration (#2859)

  • Added DadoEvalCoarseClassification

  • Removed unnecessary columns from DadoEvalCoarseClassification

  • Added EmitClassification task

  • added SardiStanceClassification task

  • Added GeoLingItClassification task

  • Added DisCoTexPairClassification tasks

  • Added EmitClassification, DadoEvalCoarseClassification, GeoLingItClassification, SardiStanceClassification inside the inits

  • changed import in DisCoTexPairClassification

  • removed GeoLingItClassification dataset

  • fixed citation formatting, missing metadata parameters and lint formatting

    • Added XGlueWRPReranking task
  • Added missing init.py files
  • fixed metadata in XGlueWRPReranking

  • Added MKQARetrieval task

  • fixed type in XGlueWRPReranking

  • changed MKQARetrieval from cross-lingual to monolingual

  • formatted MKQARetrieval file

  • removed unused const


Co-authored-by: Mattia Sangermano <[email protected]> (ee17a6e)

  • model: add Hakim and TookaSBERTV2 models (#2826)

  • add tooka v2s

  • add mcinext models

  • update mcinext.py

  • Apply PR review suggestions

  • Update mteb/models/mcinext_models.py


Co-authored-by: mehran <[email protected]>
Co-authored-by: Kenneth Enevoldsen <[email protected]> (04dc6d4)

  • Update tasks & benchmarks tables (5be02c1)

  • Add and fix some Japanese datasets: ANLP datasets, JaCWIR, JQaRA (#2872)

  • Add JaCWIR and JQaRA for reranking

  • Fix ANLP Journal datasets

  • Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval

  • tackle test cases

  • Remove _evaluate_subset usage

  • Separate v1 and v2

  • Update info for NLP Journal datasets (70768b5)

  • Comment kalm model (#2877)

comment kalm model (a3ca95c)

  • model: add kalm_models ModelMeta (new PR) (#2853)

  • feat: add KaLM_Embedding_X_0605 in kalm_models

  • Update kalm_models.py for lint format


Co-authored-by: xinshuohu <[email protected]> (b67bd04)

  • model: add listconranker modelmeta (#2874)

  • add listconranker modelmeta

  • fix bugs

  • use linter

  • lint


Co-authored-by: Roman Solomatin <[email protected]> (5846f56)

  • fix tests to be compatible with SentenceTransformers v5 (#2875)

  • fix sbert v5

  • add comment (f346a37)

  • rename seed-1.6-embedding to seed1.6-embedding (#2870) (f27648b)

  • model: Adding nvidia/llama-nemoretriever-colembed models (#2861)

  • nvidia_llama_nemoretriever_colembed

  • correct 3b reference

  • lint fix

  • add training data and license for nvidia/llama_nemoretriever_colembed

  • lint


Co-authored-by: Isaac Chung <[email protected]> (4ff1413)

  • Bump gradio to fix leaderboard sorting (#2866)

Bump gradio (a4388c2)

  • model: Adding Sailesh97/Hinvec (#2842)

  • Adding Hinvec Model's Meta data.

  • Adding hinvec_model.py

  • Update mteb/models/hinvec_models.py

Co-authored-by: Kenneth Enevoldsen <[email protected]>

  • formated code with Black and lint with Ruff

Co-authored-by: Kenneth Enevoldsen <[email protected]> (e3286d5)

1.38.33

27 Jun 21:09
Compare
Choose a tag to compare

1.38.33 (2025-06-27)

Fix

  • fix: prompt validation for tasks with - (#2846)

  • fix prompt validation

  • fix task name split correctly

  • add docstring for test (430357c)

Unknown

  • add jinav4 model meta (#2858)

  • add model meta

  • linting

  • fix: add check for code lora

  • fix: apply review comments (f1d560a)

1.38.32

25 Jun 22:48
Compare
Choose a tag to compare

1.38.32 (2025-06-25)

Fix

  • fix: update training dataset info of Seed-1.6-embedding model (#2857)

update seed1.6 model training data info (a8214e2)

1.38.31

25 Jun 12:46
Compare
Choose a tag to compare

1.38.31 (2025-06-25)

Documentation

  • docs: Fix some typos in docs/usage/usage.md (#2835)

  • Update usage.md

  • Update usage.md

  • Update docs/usage/usage.md


Co-authored-by: Isaac Chung <[email protected]> (774a942)

Fix

  • fix: Update model selection for the leaderboard (#2855)

  • fix: Update model selection for the leaderboard

fixes #2834

This removed the lower bound selection, but generally I don't think people should care about the models being too small.

  • fix 1M --> 1B

  • format

  • rename model_size -> max_model_size (9a800d3)

Unknown

  • model: add Seed-1.6-embedding model (#2841)

  • add Seed-1.6-embedding model

  • Update seed_1_6_embedding_models.py

  • update model meta info

  • support image encoder interface

  • error fix

  • fix: format seed_1_6_embedding_models.py with Ruff (8851bf0)

  • model: Add custom instructions for GigaEmbeddings (#2836)

  • add custom instructions

  • fixed

  • lint

  • fix last instruction


Co-authored-by: Kolodin Egor <[email protected]>
Co-authored-by: Roman Solomatin <[email protected]> (d7ff1ab)

1.38.30

16 Jun 08:33
Compare
Choose a tag to compare

1.38.30 (2025-06-16)

Fix

  • fix: Reuploaded previously unavailable SNL datasets (#2819)

  • fix: Reuploaded previously unavailable SNL datasets

closes #2477

  • removed exceptions from tests

  • temp fixes

  • added temporary fix

  • clean up commented out code

  • format (c790269)

Unknown

  • Update tasks & benchmarks tables (74d17b2)

  • model: Added 3 HIT-TMG's KaLM-embedding models (#2478)

  • Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper

  • Added KaLM_embedding_multilingual_mini_instruct_v1_5

  • Added model to overview.py

  • Fix Task Count Per Language Table in tasks.md

  • resolve conflicts

  • remove tasks.md

  • Modified get_instruction funcion

  • Added support for prompt dict in get_instruction

  • fix lang code

  • Address comments

  • Delete mteb/models/check_models.py

  • added prompts_dict support in InstructSentenceTransformerWrapper

  • corrected instruction format

  • corrected prompts format

  • added correct instruction format

  • fix implementation

  • remove if name main

  • add comment


Co-authored-by: Roman Solomatin <[email protected]> (03e084b)

  • add description to issue template (#2817)

  • add description to template

  • fix typo (04c9511)

1.38.29

11 Jun 19:20
Compare
Choose a tag to compare

1.38.29 (2025-06-11)

Fix

  • fix: Ensure bright uses the correct revision (#2812)

fixes #2811 (56dc620)

  • fix: Adding client arg to init method of OpenAI models wrapper (#2803)

  • Adding OpenAI client arg to init method (e.g., for already initialized AzureOpenAI client)

To use OpenAI embedding models via Azure, the model wrapper needs to be initialized with a different client.

  • Update mteb/models/openai_models.py

Co-authored-by: Roman Solomatin <[email protected]>

  • Update mteb/models/openai_models.py

  • remove comment and format


Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: Roman Solomatin <[email protected]> (873ee76)

Unknown

  • model: Add annamodels/LGAI-Embedding-Preview (#2810)

Add LGAI-Embedding

  • Add mteb/models/lgai_embedding_models.py

  • defined model metadata (3e291f3)

1.38.28

10 Jun 21:44
Compare
Choose a tag to compare

1.38.28 (2025-06-10)

Ci

  • ci: fix config error for semantic release (#2800)

discussed in: #2796 (3d8dd9e)

Fix

  • fix: Add adapted_from to Cmedqaretrieval (#2806)

  • fix: Add adapted_from to Cmedqaretrieval

Also snuck in a fix with form=None, which is no longer valid, but was still used in a few places.

Unknown

  • Update training datasets of GeoGPT-Research-Project/GeoEmbedding (#2802)

update training datasets

Co-authored-by: zhangzeqing <[email protected]> (36a3c67)

  • Update tasks & benchmarks tables (5e6aa9d)

  • dataset: Add R2MED Benchmark (#2795)

  • Add files via upload

  • Add files via upload

  • Update benchmarks.py

  • Update init.py

  • Add files via upload

  • Update R2MEDRetrieval.py

  • Update run_mteb_r2med.py

  • Delete scripts/run_mteb_r2med.py

  • Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <[email protected]>

  • Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <[email protected]>

  • Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <[email protected]>

  • Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py

Co-authored-by: Roman Solomatin <[email protected]>

  • Add files via upload

  • Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json

  • Add files via upload

  • Add files via upload

  • Add files via upload

  • Update R2MEDRetrieval.py

  • Add files via upload

  • Add files via upload

  • Add files via upload

  • Add files via upload

  • format citations

  • Update R2MEDRetrieval.py

  • Add files via upload

  • Add files via upload


Co-authored-by: Li Lei <[email protected]>
Co-authored-by: Roman Solomatin <[email protected]> (b8e64e1)

  • model: add fangxq/XYZ-embedding (#2741)

  • add xyz model

  • add xyz model

  • add xyz model

  • update

  • update

  • update

  • update

  • update

  • update

  • update

  • lint


Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Kenneth Enevoldsen <[email protected]> (1c08974)

  • model: Add GeoGPT-Research-Project/GeoEmbedding (#2773)

  • add model: geogpt_models

  • update geogpt_models

  • use InstructSentenceTransformerWrapper

  • resolve pylint warning

  • format geogpt_models.py

  • Update mteb/models/geogpt_models.py

Co-authored-by: Roman Solomatin <[email protected]>

  • Update mteb/models/geogpt_models.py

Co-authored-by: zhangzeqing <[email protected]>
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Kenneth Enevoldsen <[email protected]> (8817670)

  • Update issue and pr templates (#2782)

  • Update issue templates

  • Update bug_report.md

  • test yaml template

  • add templates

  • update templates

  • add emojis

  • fix typo

  • Apply suggestions from code review

Co-authored-by: Kenneth Enevoldsen <[email protected]>

  • update issue titles

  • update PR template

  • remove PR templates


Co-authored-by: Kenneth Enevoldsen <[email protected]> (af7adbf)

  • bump ruff (#2784) (9e2e972)

  • model: Add Qwen3 Embedding model (#2769)

  • Init code

  • Remove extra config and lint code

  • use sentence transformer

  • add revisions

  • fix lint

  • Apply suggestions from code review

Co-authored-by: Roman Solomatin <[email protected]>

  • fix lint

  • add framework


Co-authored-by: Roman Solomatin <[email protected]> (fe137d0)

  • Update tasks & benchmarks tables (360bf51)

  • dataset: Add miracl vision (#2736)

  • add miracl vision

  • add miracl vision

  • ruff

  • cast

  • image

  • image

  • add langs

  • add langs

  • add langs

  • add langs

  • descriptive stats

  • lint

  • lint

  • lint

  • remove com (61dc369)

1.38.27

05 Jun 16:38
Compare
Choose a tag to compare

1.38.27 (2025-06-05)

Fix

  • fix: CachedEmbeddingWrapper issues in both documentation and code (#2779)

Fixes #2772 (f7656d5)

1.38.26

05 Jun 16:09
Compare
Choose a tag to compare

1.38.26 (2025-06-05)

Fix

  • fix: Update Caltech101 datasets to latest revision [v1] (#2778)

  • fix: Update Caltech101 datasets to latest revision [v2]

fixes: #2770
Fixes the issue, but only in v1

# tested using:

task: mteb.AbsTask = mteb.get_task(&#34;Caltech101ZeroShot&#34;)
task.load_data()
task.get_candidate_labels()

1.38.25

05 Jun 15:30
Compare
Choose a tag to compare

1.38.25 (2025-06-05)

Ci

  • ci: add new prefixes to releases (#2766)

add new prefixes (755a6eb)

Fix

  • fix: Update giga embeddings (#2774)

  • update giga embeddings

  • update giga embeddings


Co-authored-by: Kolodin Egor <[email protected]> (5b71e34)