Releases: embeddings-benchmark/mteb
1.38.34
1.38.34 (2025-07-10)
Fix
- fix: pin datasets version (#2892)
fix datasets version (00c95cf
)
Unknown
-
Update tasks & benchmarks tables (
5303fec
) -
dataset: Evalita dataset integration (#2859)
-
Added DadoEvalCoarseClassification
-
Removed unnecessary columns from DadoEvalCoarseClassification
-
Added EmitClassification task
-
added SardiStanceClassification task
-
Added GeoLingItClassification task
-
Added DisCoTexPairClassification tasks
-
Added EmitClassification, DadoEvalCoarseClassification, GeoLingItClassification, SardiStanceClassification inside the inits
-
changed import in DisCoTexPairClassification
-
removed GeoLingItClassification dataset
-
fixed citation formatting, missing metadata parameters and lint formatting
-
- Added XGlueWRPReranking task
- Added missing init.py files
-
fixed metadata in XGlueWRPReranking
-
Added MKQARetrieval task
-
fixed type in XGlueWRPReranking
-
changed MKQARetrieval from cross-lingual to monolingual
-
formatted MKQARetrieval file
-
removed unused const
Co-authored-by: Mattia Sangermano <[email protected]> (ee17a6e
)
-
model: add Hakim and TookaSBERTV2 models (#2826)
-
add tooka v2s
-
add mcinext models
-
update mcinext.py
-
Apply PR review suggestions
-
Update mteb/models/mcinext_models.py
Co-authored-by: mehran <[email protected]>
Co-authored-by: Kenneth Enevoldsen <[email protected]> (04dc6d4
)
-
Update tasks & benchmarks tables (
5be02c1
) -
Add and fix some Japanese datasets: ANLP datasets, JaCWIR, JQaRA (#2872)
-
Add JaCWIR and JQaRA for reranking
-
Fix ANLP Journal datasets
-
Add NLPJournalAbsArticleRetrieval and JaCWIRRetrieval
-
tackle test cases
-
Remove _evaluate_subset usage
-
Separate v1 and v2
-
Update info for NLP Journal datasets (
70768b5
) -
Comment kalm model (#2877)
comment kalm model (a3ca95c
)
-
model: add kalm_models ModelMeta (new PR) (#2853)
-
feat: add KaLM_Embedding_X_0605 in kalm_models
-
Update kalm_models.py for lint format
Co-authored-by: xinshuohu <[email protected]> (b67bd04
)
-
model: add listconranker modelmeta (#2874)
-
add listconranker modelmeta
-
fix bugs
-
use linter
-
lint
Co-authored-by: Roman Solomatin <[email protected]> (5846f56
)
-
fix tests to be compatible with
SentenceTransformers
v5
(#2875) -
fix sbert
v5
-
add comment (
f346a37
) -
rename seed-1.6-embedding to seed1.6-embedding (#2870) (
f27648b
) -
model: Adding nvidia/llama-nemoretriever-colembed models (#2861)
-
nvidia_llama_nemoretriever_colembed
-
correct 3b reference
-
lint fix
-
add training data and license for nvidia/llama_nemoretriever_colembed
-
lint
Co-authored-by: Isaac Chung <[email protected]> (4ff1413
)
- Bump gradio to fix leaderboard sorting (#2866)
Bump gradio (a4388c2
)
-
model: Adding Sailesh97/Hinvec (#2842)
-
Adding Hinvec Model's Meta data.
-
Adding hinvec_model.py
-
Update mteb/models/hinvec_models.py
Co-authored-by: Kenneth Enevoldsen <[email protected]>
- formated code with Black and lint with Ruff
Co-authored-by: Kenneth Enevoldsen <[email protected]> (e3286d5
)
1.38.33
1.38.32
1.38.31
1.38.31 (2025-06-25)
Documentation
-
docs: Fix some typos in
docs/usage/usage.md
(#2835) -
Update usage.md
-
Update usage.md
-
Update docs/usage/usage.md
Co-authored-by: Isaac Chung <[email protected]> (774a942
)
Fix
-
fix: Update model selection for the leaderboard (#2855)
-
fix: Update model selection for the leaderboard
fixes #2834
This removed the lower bound selection, but generally I don't think people should care about the models being too small.
-
fix 1M --> 1B
-
format
-
rename model_size -> max_model_size (
9a800d3
)
Unknown
-
model: add Seed-1.6-embedding model (#2841)
-
add Seed-1.6-embedding model
-
Update seed_1_6_embedding_models.py
-
update model meta info
-
support image encoder interface
-
error fix
-
fix: format seed_1_6_embedding_models.py with Ruff (
8851bf0
) -
model: Add custom instructions for GigaEmbeddings (#2836)
-
add custom instructions
-
fixed
-
lint
-
fix last instruction
Co-authored-by: Kolodin Egor <[email protected]>
Co-authored-by: Roman Solomatin <[email protected]> (d7ff1ab
)
1.38.30
1.38.30 (2025-06-16)
Fix
-
fix: Reuploaded previously unavailable SNL datasets (#2819)
-
fix: Reuploaded previously unavailable SNL datasets
closes #2477
-
removed exceptions from tests
-
temp fixes
-
added temporary fix
-
clean up commented out code
-
format (
c790269
)
Unknown
-
Update tasks & benchmarks tables (
74d17b2
) -
model: Added 3 HIT-TMG's KaLM-embedding models (#2478)
-
Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper
-
Added KaLM_embedding_multilingual_mini_instruct_v1_5
-
Added model to overview.py
-
Fix Task Count Per Language Table in tasks.md
-
resolve conflicts
-
remove tasks.md
-
Modified get_instruction funcion
-
Added support for prompt dict in get_instruction
-
fix lang code
-
Address comments
-
Delete mteb/models/check_models.py
-
added prompts_dict support in InstructSentenceTransformerWrapper
-
corrected instruction format
-
corrected prompts format
-
added correct instruction format
-
fix implementation
-
remove
if name main
-
add comment
Co-authored-by: Roman Solomatin <[email protected]> (03e084b
)
1.38.29
1.38.29 (2025-06-11)
Fix
- fix: Ensure bright uses the correct revision (#2812)
-
fix: Adding client arg to init method of OpenAI models wrapper (#2803)
-
Adding OpenAI client arg to init method (e.g., for already initialized AzureOpenAI client)
To use OpenAI embedding models via Azure, the model wrapper needs to be initialized with a different client.
- Update mteb/models/openai_models.py
Co-authored-by: Roman Solomatin <[email protected]>
-
Update mteb/models/openai_models.py
-
remove comment and format
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: Roman Solomatin <[email protected]> (873ee76
)
Unknown
- model: Add annamodels/LGAI-Embedding-Preview (#2810)
Add LGAI-Embedding
-
Add mteb/models/lgai_embedding_models.py
-
defined model metadata (
3e291f3
)
1.38.28
1.38.28 (2025-06-10)
Ci
- ci: fix config error for semantic release (#2800)
Fix
-
fix: Add adapted_from to Cmedqaretrieval (#2806)
-
fix: Add adapted_from to Cmedqaretrieval
Also snuck in a fix with form=None, which is no longer valid, but was still used in a few places.
- format (
fef1837
)
Unknown
- Update training datasets of GeoGPT-Research-Project/GeoEmbedding (#2802)
update training datasets
Co-authored-by: zhangzeqing <[email protected]> (36a3c67
)
-
Update tasks & benchmarks tables (
5e6aa9d
) -
dataset: Add R2MED Benchmark (#2795)
-
Add files via upload
-
Add files via upload
-
Update benchmarks.py
-
Update init.py
-
Add files via upload
-
Update R2MEDRetrieval.py
-
Update run_mteb_r2med.py
-
Delete scripts/run_mteb_r2med.py
-
Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py
Co-authored-by: Roman Solomatin <[email protected]>
- Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py
Co-authored-by: Roman Solomatin <[email protected]>
- Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py
Co-authored-by: Roman Solomatin <[email protected]>
- Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py
Co-authored-by: Roman Solomatin <[email protected]>
-
Add files via upload
-
Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json
-
Add files via upload
-
Add files via upload
-
Add files via upload
-
Update R2MEDRetrieval.py
-
Add files via upload
-
Add files via upload
-
Add files via upload
-
Add files via upload
-
format citations
-
Update R2MEDRetrieval.py
-
Add files via upload
-
Add files via upload
Co-authored-by: Li Lei <[email protected]>
Co-authored-by: Roman Solomatin <[email protected]> (b8e64e1
)
-
model: add fangxq/XYZ-embedding (#2741)
-
add xyz model
-
add xyz model
-
add xyz model
-
update
-
update
-
update
-
update
-
update
-
update
-
update
-
lint
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Kenneth Enevoldsen <[email protected]> (1c08974
)
-
model: Add GeoGPT-Research-Project/GeoEmbedding (#2773)
-
add model: geogpt_models
-
update geogpt_models
-
use InstructSentenceTransformerWrapper
-
resolve pylint warning
-
format geogpt_models.py
-
Update mteb/models/geogpt_models.py
Co-authored-by: Roman Solomatin <[email protected]>
- Update mteb/models/geogpt_models.py
Co-authored-by: zhangzeqing <[email protected]>
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Kenneth Enevoldsen <[email protected]> (8817670
)
-
Update issue and pr templates (#2782)
-
Update issue templates
-
Update bug_report.md
-
test yaml template
-
add templates
-
update templates
-
add emojis
-
fix typo
-
Apply suggestions from code review
Co-authored-by: Kenneth Enevoldsen <[email protected]>
-
update issue titles
-
update PR template
-
remove PR templates
Co-authored-by: Kenneth Enevoldsen <[email protected]> (af7adbf
)
-
model: Add Qwen3 Embedding model (#2769)
-
Init code
-
Remove extra config and lint code
-
use sentence transformer
-
add revisions
-
fix lint
-
Apply suggestions from code review
Co-authored-by: Roman Solomatin <[email protected]>
-
fix lint
-
add framework
Co-authored-by: Roman Solomatin <[email protected]> (fe137d0
)
1.38.27
1.38.26
1.38.26 (2025-06-05)
Fix
-
fix: Update Caltech101 datasets to latest revision [v1] (#2778)
-
fix: Update Caltech101 datasets to latest revision [v2]
fixes: #2770
Fixes the issue, but only in v1
# tested using:
task: mteb.AbsTask = mteb.get_task("Caltech101ZeroShot")
task.load_data()
task.get_candidate_labels()
- fix rev (
40f0841
)