fix: Add Encodechka benchmark #2561

Samoed · 2025-04-19T15:25:01Z

Encodechka is an older Russian benchmark that served as the main evaluation suite before the introduction of MTEB(rus)

CC @avidale

MTEB results:

Model	RUParaPhraserSTS	XNLI	InappropriatenessClassificationv2	RuNLUIntentClassification	RuToxicOKMLCUPClassification	SentiRuEval2016
deepvk/USER-bge-m3	76.36	88.11	0.583967	0.66502	0.8271	0.700367
sergeyzh/rubert-tiny-turbo	72.15	78.47	0.5511	0.56762	0.7484	0.597567
cointegrated/LaBSE-en-ru	65.87	66.94	0.572433	0.6027	0.74645	0.5599

Encodechka results:

model	PI (RUParaPhraserSTS)	STS (RuSTSBenchmarkSTS)	NLI (XNLI)	IA (InappropriatenessClassificationv2)	IC (RuNLUIntentClassification `rus`)	ICX (RuNLUIntentClassification `eng-rus`)	TI (RuToxicOKMLCUPClassification)	SA (SentiRuEval2016)
deepvk/USER-bge-m3	0.76	0.87	0.58	0.79	0.81	0.78	0.97	0.82
sergeyzh/rubert-tiny-turbo	0.72	0.83	0.48	0.76	0.78	0.68	0.95	0.79
cointegrated/LaBSE-en-ru	0.66	0.79	0.43	0.77	0.79	0.77	0.95	0.76

Full results embeddings-benchmark/results#182

We see differences across most tasks due to differing classification settings. Encodechka uses cross-validation on the full training split with LogisticRegression(max_iter=10_000), whereas MTEB uses a fixed number of samples per label and LogisticRegression(max_iter=100). That said, I believe the overall ranking should remain similar.

As for the InappropriatenessClassification dataset, it appears to be different from what Encodechka used. I wasn’t able to match it with any version from the original repository.

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

KennethEnevoldsen

Looks good, not much to add here

# Conflicts: # mteb/benchmarks/benchmarks.py # mteb/leaderboard/benchmark_selector.py

isaac-chung · 2025-05-01T05:46:45Z

mteb/benchmarks/benchmarks.py

@@ -1599,14 +1602,14 @@
    document undestanding, visual STS, and CV-centric tasks.""",
    reference="",
    contacts=["gowitheflow-1998", "isaac-chung"],
-    citation="""@article{xiao2025mieb,


@Samoed @KennethEnevoldsen this seems unrelated to this PR. I had previously updated this to match the MTEB paper's bibtex style. Would appreciate it if you could revert this.

I will fix it in separate PR then. I don't know why it was changed

@ayush1298

* SpeedTask add deprecated warning (#2493) * Docs: Update README.md (#2494) Update README.md * fix transformers version for now (#2504) * Fix typos (#2509) * ci: refactor TaskMetadata eval langs test (#2501) * refactor eval langs test * function returns None * add hard negaties tasks in _HISTORIC_DATASETS * rename to ImageClustering folder (#2516) rename folder * Clean up trailing spaces citation (#2518) * rename folder * trailing spaces * missed one * [mieb] Memotion preprocessing code made more robust and readable (#2519) * fix: validate lang code in ModelMeta (#2499) * Update pyproject.toml (#2522) * 1.36.38 Automatically generated by python-semantic-release * Fix leaderboard version (#2524) * fix gradio leaderboard run * update docs * Fix gte-multilingual-base embed_dim (#2526) * [MIEB] Specify only the multilingual AggTask for MIEB-lite (#2539) specify only the multilingual AggTask * [mieb] fix hatefulmemes (#2531) * fix hatefulmeme * add to description and use polars instead --------- Co-authored-by: Isaac Chung <[email protected]> * Model conan (#2534) * conan_models * conan_models * refactor code * refactor code --------- Co-authored-by: shyuli <[email protected]> * fix: Update mteb.get_tasks with an exclude_aggregate parameter to exclude aggregate tasks (#2536) * Implement task.is_aggregate check * Add `mteb.get_tasks` parameter `include_aggregate` to exclude aggregate tasks if needed * Update mteb.run with the new `task.is_aggregate` parameter * Add tests * Ran linter * Changed logic to `exclude_aggregate` * Updated from review comments * Exclude aggregate by default false in get_tasks * 1.36.39 Automatically generated by python-semantic-release * docs: Add MIEB citation in benchmarks (#2544) Add MIEB citation in benchmarks * Add 2 new Vietnamese Retrieval Datasets (#2393) * [ADD] 2 new Datasets * [UPDATE] Change bibtext_citation for GreenNodeTableMarkdownRetrieval as TODO * [UPDATE] Change bibtext_citation for ZacLegalTextRetrieval as TODO * Update tasks table * fix: CacheWrapper per task (#2467) * feat: CacheWrapper per task * refactor logic * update documentation --------- Co-authored-by: Florian Rottach <[email protected]> * 1.36.40 Automatically generated by python-semantic-release * misc: move MMTEB scripts and notebooks to separate repo (#2546) move mmteb scripts and notebooks to separate repo * fix: Update requirements in JinaWrapper (#2548) fix: Update package requirements in JinaWrapper for einops and flash_attn * 1.36.41 Automatically generated by python-semantic-release * Docs: Add MIEB to README (#2550) Add MIEB to README * Add xlm_roberta_ua_distilled (#2547) * defined model metadata for xlm_roberta_ua_distilled * Update mteb/models/ua_sentence_models.py Co-authored-by: Roman Solomatin <[email protected]> * included ua_sentence_models.py in overview.py * applied linting, added missing fields in ModelMeta * applied linting --------- Co-authored-by: Roman Solomatin <[email protected]> * fix me5 trainind data config to include xquad dataset (#2552) * fix: me5 trainind data config to include xquad dataset * Update mteb/models/e5_models.py upddate: xquad key name Co-authored-by: Roman Solomatin <[email protected]> * fix: ME5_TRAINING_DATA format --------- Co-authored-by: Roman Solomatin <[email protected]> * feat: Added dataframe utilities to BenchmarkResults (#2542) * fix: Added dataframe utilities to BenchmarkResults - Added `get_results_table`. I was considering renaming it to `to_dataframe` to align with `tasks.to_dataframe`. WDYT? - Added a tests for ModelResults and BenchmarksResults - Added a few utility functions where needed - Added docstring throughout ModelResults and BenchmarksResults - Added todo comment for missing aspects - mostly v2 - but we join_revisions seems like it could use an update before then. Prerequisite for #2454: @ayush1298 can I ask you to review this PR as well? I hope this give an idea of what I was hinting at. Sorry that it took a while. I wanted to make sure to get it right. * refactor to to_dataframe and combine common dependencies * ibid * fix revision joining after discussion with @x-tabdeveloping * remove strict=True for zip() as it is a >3.9 feature * updated mock cache * 1.37.0 Automatically generated by python-semantic-release * fix e5_R_mistral_7b (#2490) * fix e5_R_mistral_7b * change wrapper * address comments * Added kwargs for pad_token * correct lang format * address comments * add revision --------- Co-authored-by: Roman Solomatin <[email protected]> * fix unintentional working of filters on leaderboard (#2535) * fix unintentional working of filters on leaderboard * address comments * make lint * address comments * rollback unnecessary changes * feat: UI Overhaul (#2549) * Bumped gradio version to latest * Added new Gradio table functionality to leaderboard * Removed search bar * Changed color scheme in plot to match the table * Added new benchmark selector in sidebar * Changed not activated button type to secondary * Short-circuited callbacks that are based on language selection * Re-added column width calculation since it got messed up * Commented out gradient for per-task table as it slowed things down substantially * Styling and layout updates * Adjusted comments according to reviews * Converted all print statements to logger.debug * Removed pydantic version fix * Ran linting * Remove commented out code Co-authored-by: Kenneth Enevoldsen <[email protected]> * Moved English,v1 to Legacy section * Closed the benchmark sharing accordion by default * Adjusted markdown blocks according to suggestions * Ran linter --------- Co-authored-by: Kenneth Enevoldsen <[email protected]> * 1.38.0 Automatically generated by python-semantic-release * add USER2 (#2560) * add user2 * add training code * update prompts * Fix leaderboard entry for BuiltBench (#2563) Fix leaderboard entry for BuiltBench (#2562) Co-authored-by: Mehrzad Shahin-Moghadam <[email protected]> * fix: jasper models embeddings having nan values (#2481) * 1.38.1 Automatically generated by python-semantic-release * fix frida datasets (#2565) * Add relle (#2564) * Add relle * defined model metadata for relle * Add mteb/models/relle_models.py * Update mteb/models/relle_models.py Co-authored-by: Roman Solomatin <[email protected]> * lint after commit run after "make lint" * Add into model_modules Add model into model_modules and lint check --------- Co-authored-by: Roman Solomatin <[email protected]> * Backfill task metadata for metadata for GermanDPR and GermanQuAD (#2566) * Add metadata for GermanDPR and GermanQuAD * PR improvements * Update tasks table * Add ModelMeta for CodeSearch-ModernBERT-Crow-Plus (#2570) * Add files via upload * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update overview.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update mteb/models/shuu_model.py Co-authored-by: Roman Solomatin <[email protected]> --------- Co-authored-by: Roman Solomatin <[email protected]> * Docs: Improve MIEB docs (#2569) * Add missing annotations (#2498) * Update tasks table * move icon & name to benchmark dataclass (#2573) * Remove the comments from ImageEncoder (#2579) * fix: Add Encodechka benchmark (#2561) * add tasks * add benchmark * fix imports * update stsb split * Update tasks table * 1.38.2 Automatically generated by python-semantic-release * fix FlagEmbedding package name (#2588) * fix codecarbon version (#2587) * Add MIEB image only benchmark (#2590) * add vision only bench * add description * correct zs task modalities * specify tasks param * Add image only MIEB benchmark to LB left panel (#2596) * Update benchmarks.py * make lint * add to left side bar * update Doubao-1.5-Embedding (#2575) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <[email protected]> * update logging * update lint --------- Co-authored-by: zhangpeitian <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * fix: Add WebSSL models (#2604) * add 2 web SSL dino models * add models from collection and revisions * update memory_usage_mb and embed dim * use automodel instead * fix mieb citation (#2606) * 1.38.3 Automatically generated by python-semantic-release * Update Doubao-1.5-Embedding (#2611) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <[email protected]> * update logging * update lint * update link --------- Co-authored-by: zhangpeitian <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * CI: update benchmark table (#2609) * update benchmark table * fix table * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update Doubao-1.5-Embedding revision (#2613) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <[email protected]> * update logging * update lint * update link * update revision --------- Co-authored-by: zhangpeitian <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * CI: fix table (#2615) * Update tasks & benchmarks tables * fixes * Update gradio version (#2558) * Update gradio version Closes #2557 * bump gradio * fix: Removed missing dataset for MTEB(Multilingual) and bumped version We should probably just have done this earlier to ensure that the multilingual benchamrk is runable. * CI: fix infinitely committing issue (#2616) * fix token * try to trigger * add token * test ci * Update tasks & benchmarks tables * Update tasks & benchmarks tables * remove test lines --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * fix retrieval loader * add descriptive stats * Add ScandiSent dataset (#2620) * add scandisent dataset * add to init * typo * lint * 1.38.4 Automatically generated by python-semantic-release * Format all citations (#2614) * Fix errors in bibtex_citation * Format all bibtex_citation fields * format benchmarks * fix format * Fix tests * add formatting script * fix citations * update imports * fix citations * fix citations * format citation --------- Co-authored-by: Isaac Chung <[email protected]> Co-authored-by: Niklas Muennighoff <[email protected]> Co-authored-by: chenghao xiao <[email protected]> Co-authored-by: Munot Ayush Sunil <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: E. Tolga Ayan <[email protected]> Co-authored-by: lllsy12138 <[email protected]> Co-authored-by: shyuli <[email protected]> Co-authored-by: Siddharth M. Bhatia <[email protected]> Co-authored-by: Bao Loc Pham <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Flo <[email protected]> Co-authored-by: Florian Rottach <[email protected]> Co-authored-by: Alexey Vatolin <[email protected]> Co-authored-by: Olesksii Horchynskyi <[email protected]> Co-authored-by: Pandaswag <[email protected]> Co-authored-by: Kenneth Enevoldsen <[email protected]> Co-authored-by: Márton Kardos <[email protected]> Co-authored-by: Mehrzad Shahin-Moghadam <[email protected]> Co-authored-by: Mehrzad Shahin-Moghadam <[email protected]> Co-authored-by: Youngjoon Jang <[email protected]> Co-authored-by: 24September <[email protected]> Co-authored-by: Jan Karaś <[email protected]> Co-authored-by: Shuu <[email protected]> Co-authored-by: namespace-Pt <[email protected]> Co-authored-by: zhangpeitian <[email protected]>

@ayush1298

* Update tasks table * 1.36.26 Automatically generated by python-semantic-release * Pass task name to all evaluators (#2389) * pass task name to all tasks * add test * fix loader * fix: renaming Zeroshot -> ZeroShot (#2395) * fix: renaming Zeroshot -> ZeroShot Adresses #2078 * rename 1 * rename 2 * format * fixed error * 1.36.27 Automatically generated by python-semantic-release * fix: Update AmazonPolarityClassification license (#2402) Update AmazonPolarityClassification.py * fix b1ade name (#2403) * 1.36.28 Automatically generated by python-semantic-release * Minor style changes (#2396) * fix: renaming Zeroshot -> ZeroShot Adresses #2078 * fix: minor style changes Adresses #2078 * rename 1 * rename 2 * format * fixed error --------- Co-authored-by: Isaac Chung <[email protected]> * Added new dataset and tasks - ClusTREC-covid , clustering of thematic covid related scientific papers (#2302) * Clustrec covid new dataset and task * fix * fix * fix * fix * fix * descriptive stats * change all mentions of clustrec-covidp2p to clustrec-covid * change ' to " * Update tasks table * fix: Major updates to docs + make mieb dep optional (#2397) * fix: renaming Zeroshot -> ZeroShot Adresses #2078 * fix: minor style changes Adresses #2078 * fix: Major updates to documentation This PR does the following: - This introduced other modalities more clearly in the documentation as well as make it easier to transition to a full on documentation site later. - added minor code updates due to discovered inconsistencies in docs and code. - Added the MMTEB citation where applicable - makes the docs ready to move torchvision to an optional dependency * Moved VISTA example * rename 1 * rename 2 * format * fixed error * fix: make torchvision optional (#2399) * fix: make torchvision optional * format * add docs * minor fix * remove transform from Any2TextMultipleChoiceEvaluator --------- Co-authored-by: Isaac Chung <[email protected]> * move Running SentenceTransformer model with prompts to usage --------- Co-authored-by: Isaac Chung <[email protected]> * 1.36.29 Automatically generated by python-semantic-release * remove Arabic_Triplet_Matryoshka_V2.py (#2405) * Min torchvision>0.2.1 (#2410) matching torch>1.0.0 * fix: Add validation to model_name in `ModelMeta` (#2404) * add test for name validation * upd docs * upd cohere name * fix tests * fix name for average_word_embeddings_komninos * fix name for average_word_embeddings_komninos * fix reranker test * fix reranker test * 1.36.30 Automatically generated by python-semantic-release * [MIEB] "capability measured"-Abstask 1-1 matching refactor [1/3]: reimplement CV-Bench (#2414) * refactor CV-Bench * reimplement CV Bench * remove abstask/evaluator/tests for Any2TextMultipleChoice * rerun descriptive stats * Update tasks table * fix: Add option to remove benchmark from leaderboard (#2417) fix: Add option to remove leaderboard from leaderboard fixes #2413 This only removed the benchmark from the leaderboard but keep it in MTEB. * 1.36.31 Automatically generated by python-semantic-release * fix: Add VDR Multilingual Dataset (#2408) * Added VDR Multilingual Dataset * address comments * make lint * Formated Dataset for retrieval * Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py Co-authored-by: Roman Solomatin <[email protected]> * Update mteb/tasks/Retrieval/multilingual/VdrMultilingualRetrieval.py Co-authored-by: Roman Solomatin <[email protected]> * make lint * corrected date * fix dataset building * move to image folder --------- Co-authored-by: Roman Solomatin <[email protected]> Co-authored-by: Isaac Chung <[email protected]> * Update tasks table * 1.36.32 Automatically generated by python-semantic-release * HOTFIX: pin setuptools (#2423) * pin setuptools * pin setuptools * pin setuptools in makefile * try ci * fix ci * remove speed from installs * add __init__.py Clustering > kor folder, And edit __init__.py in Clustering folder (#2422) * add PatentFnBClustering.py * do make lint and revise * rollback Makefile * Update mteb/tasks/Clustering/kor/PatentFnBClustering.py Co-authored-by: Roman Solomatin <[email protected]> * klue_mrc_domain * make lint * klue_modified_clustering_dataset * clustering & kor folder add __init.py * clustering & kor folder add __init__.py * task.py roll-back * correct text_creation to sample_creation & delete form in MetaData * correct task_subtype in TaskMetaData * delete space * edit metadata * edit task_subtypes --------- Co-authored-by: Roman Solomatin <[email protected]> * Update tasks table * Update speed dependencies with new setuptools release (#2429) * add richinfoai models (#2427) * add richinfoai models add richinfoai models * format codes by linter format codes by linter * Added Memory Usage column on leaderboard (#2428) * docs: typos; Standardize spacing; Chronological order (#2436) * Fix typos; add chrono order * Fix spacing * fix: Add model specific dependencies in pyproject.toml (#2424) * Add model specific dependencies in pyproject.toml * Update documentation * 1.36.33 Automatically generated by python-semantic-release * [MIEB] "capability measured"-Abstask 1-1 matching refactor [2/3]: reimplement r-Oxford and r-Paris (#2442) * MutipleChoiceEvaluationMixin; reimplement r-Oxford and r-Paris; rerun stats * modify benchmark list * fix citation * Update tasks table * Error while evaluating MIRACLRetrievalHardNegatives: 'trust_remote_code' (#2445) Fixes #2444 * Feat/searchmap preview (#2420) * Added meta information about SearchMap_Preview model to the model_dir * Added meta information about SearchMap_Preview model to the model_dir * updated revision name * Device loading and cuda cache cleaning step left out * removed task instructions since it's not necessary * changed sentence transformer loader to mteb default loader and passed instructions s model prompts * Included searchmap to the models overview page * Included searchmap to the models overview page * added meta data information about where model was adpated from * Update mteb/models/searchmap_models.py * fix lint * lint --------- Co-authored-by: Roman Solomatin <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * Add Background Gradients in Summary and Task Table (#2392) * Add Background Gradients in Summary and Task Table * Remove warnings and add light green cmap * Address comments * Separate styling function * address comments * added comments * add ops_moa_models (#2439) * add ops_moa_models * add custom implementations * Simplify custom implementation and format the code * support SentenceTransformers * add training datasets * Update mteb/models/ops_moa_models.py Co-authored-by: Roman Solomatin <[email protected]> * update training_datasets --------- Co-authored-by: kunka.xgw <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * leaderboard fix (#2456) * ci: cache `~/.cache/huggingface` (#2464) ci: cache ~/.cache/huggingface Co-authored-by: sam021313 <[email protected]> * [MIEB] "capability measured"-Abstask 1-1 matching refactor [3/3]: reimplement ImageCoDe (#2468) * reimplement ImageCoDe with ImageTextPairClassification * add missing stats file * Update tasks table * fix: Adds family of NeuML/pubmedbert-base-embedding models (#2443) * feat: added pubmedbert model2vec models * fix: attribute model_name * fix: fixed commit hash for pubmed_bert model2vec models * fix: changes requested in PR 2443 * fix: add nb_sbert model (#2339) * add_nb_sbert_model * Update nb_sbert.py added n_parameters and release_date * Update mteb/models/nb_sbert.py Co-authored-by: Roman Solomatin <[email protected]> * Update nb_sbert.py fix: make lint * added nb_sbert to overview.py + ran make lint * Update nb_sbert.py Fix error: Input should be a valid date or datetime, month value is outside expected range of 1-12 --------- Co-authored-by: Roman Solomatin <[email protected]> * 1.36.34 Automatically generated by python-semantic-release * suppress logging warnings on leaderboard (#2406) * supress logging warnings * remove loggers * return blocks * rename function * fix gme models * add server name * update after merge * fix ruff * fix: E5 instruct now listed as sbert compatible (#2475) Fixes #1442 * 1.36.35 Automatically generated by python-semantic-release * [MIEB] rename VisionCentric to VisionCentricQA (#2479) rename VisionCentric to VisionCentricQA * ci: Run dataset loading only when pushing to main (#2480) Update dataset_loading.yml * fix table in tasks.md (#2483) * Update tasks table * fix: add prompt to NanoDBPedia (#2486) * 1.36.36 Automatically generated by python-semantic-release * Fix Task Lang Table (#2487) * Fix Task Lang Table * added tasks.md * fix * fix: Ignore datasets not available in tests (#2484) * 1.36.37 Automatically generated by python-semantic-release * [MIEB] align main metrics with leaderboard (#2489) align main metrics with leaderboard * typo in model name (#2491) * SpeedTask add deprecated warning (#2493) * Docs: Update README.md (#2494) Update README.md * fix transformers version for now (#2504) * Fix typos (#2509) * ci: refactor TaskMetadata eval langs test (#2501) * refactor eval langs test * function returns None * add hard negaties tasks in _HISTORIC_DATASETS * rename to ImageClustering folder (#2516) rename folder * Clean up trailing spaces citation (#2518) * rename folder * trailing spaces * missed one * [mieb] Memotion preprocessing code made more robust and readable (#2519) * fix: validate lang code in ModelMeta (#2499) * Update pyproject.toml (#2522) * 1.36.38 Automatically generated by python-semantic-release * Fix leaderboard version (#2524) * fix gradio leaderboard run * update docs * Fix gte-multilingual-base embed_dim (#2526) * [MIEB] Specify only the multilingual AggTask for MIEB-lite (#2539) specify only the multilingual AggTask * [mieb] fix hatefulmemes (#2531) * fix hatefulmeme * add to description and use polars instead --------- Co-authored-by: Isaac Chung <[email protected]> * Model conan (#2534) * conan_models * conan_models * refactor code * refactor code --------- Co-authored-by: shyuli <[email protected]> * fix: Update mteb.get_tasks with an exclude_aggregate parameter to exclude aggregate tasks (#2536) * Implement task.is_aggregate check * Add `mteb.get_tasks` parameter `include_aggregate` to exclude aggregate tasks if needed * Update mteb.run with the new `task.is_aggregate` parameter * Add tests * Ran linter * Changed logic to `exclude_aggregate` * Updated from review comments * Exclude aggregate by default false in get_tasks * 1.36.39 Automatically generated by python-semantic-release * docs: Add MIEB citation in benchmarks (#2544) Add MIEB citation in benchmarks * Add 2 new Vietnamese Retrieval Datasets (#2393) * [ADD] 2 new Datasets * [UPDATE] Change bibtext_citation for GreenNodeTableMarkdownRetrieval as TODO * [UPDATE] Change bibtext_citation for ZacLegalTextRetrieval as TODO * Update tasks table * fix: CacheWrapper per task (#2467) * feat: CacheWrapper per task * refactor logic * update documentation --------- Co-authored-by: Florian Rottach <[email protected]> * 1.36.40 Automatically generated by python-semantic-release * misc: move MMTEB scripts and notebooks to separate repo (#2546) move mmteb scripts and notebooks to separate repo * fix: Update requirements in JinaWrapper (#2548) fix: Update package requirements in JinaWrapper for einops and flash_attn * 1.36.41 Automatically generated by python-semantic-release * Docs: Add MIEB to README (#2550) Add MIEB to README * Add xlm_roberta_ua_distilled (#2547) * defined model metadata for xlm_roberta_ua_distilled * Update mteb/models/ua_sentence_models.py Co-authored-by: Roman Solomatin <[email protected]> * included ua_sentence_models.py in overview.py * applied linting, added missing fields in ModelMeta * applied linting --------- Co-authored-by: Roman Solomatin <[email protected]> * fix me5 trainind data config to include xquad dataset (#2552) * fix: me5 trainind data config to include xquad dataset * Update mteb/models/e5_models.py upddate: xquad key name Co-authored-by: Roman Solomatin <[email protected]> * fix: ME5_TRAINING_DATA format --------- Co-authored-by: Roman Solomatin <[email protected]> * feat: Added dataframe utilities to BenchmarkResults (#2542) * fix: Added dataframe utilities to BenchmarkResults - Added `get_results_table`. I was considering renaming it to `to_dataframe` to align with `tasks.to_dataframe`. WDYT? - Added a tests for ModelResults and BenchmarksResults - Added a few utility functions where needed - Added docstring throughout ModelResults and BenchmarksResults - Added todo comment for missing aspects - mostly v2 - but we join_revisions seems like it could use an update before then. Prerequisite for #2454: @ayush1298 can I ask you to review this PR as well? I hope this give an idea of what I was hinting at. Sorry that it took a while. I wanted to make sure to get it right. * refactor to to_dataframe and combine common dependencies * ibid * fix revision joining after discussion with @x-tabdeveloping * remove strict=True for zip() as it is a >3.9 feature * updated mock cache * 1.37.0 Automatically generated by python-semantic-release * fix e5_R_mistral_7b (#2490) * fix e5_R_mistral_7b * change wrapper * address comments * Added kwargs for pad_token * correct lang format * address comments * add revision --------- Co-authored-by: Roman Solomatin <[email protected]> * fix unintentional working of filters on leaderboard (#2535) * fix unintentional working of filters on leaderboard * address comments * make lint * address comments * rollback unnecessary changes * feat: UI Overhaul (#2549) * Bumped gradio version to latest * Added new Gradio table functionality to leaderboard * Removed search bar * Changed color scheme in plot to match the table * Added new benchmark selector in sidebar * Changed not activated button type to secondary * Short-circuited callbacks that are based on language selection * Re-added column width calculation since it got messed up * Commented out gradient for per-task table as it slowed things down substantially * Styling and layout updates * Adjusted comments according to reviews * Converted all print statements to logger.debug * Removed pydantic version fix * Ran linting * Remove commented out code Co-authored-by: Kenneth Enevoldsen <[email protected]> * Moved English,v1 to Legacy section * Closed the benchmark sharing accordion by default * Adjusted markdown blocks according to suggestions * Ran linter --------- Co-authored-by: Kenneth Enevoldsen <[email protected]> * 1.38.0 Automatically generated by python-semantic-release * add USER2 (#2560) * add user2 * add training code * update prompts * Fix leaderboard entry for BuiltBench (#2563) Fix leaderboard entry for BuiltBench (#2562) Co-authored-by: Mehrzad Shahin-Moghadam <[email protected]> * fix: jasper models embeddings having nan values (#2481) * 1.38.1 Automatically generated by python-semantic-release * fix frida datasets (#2565) * Add relle (#2564) * Add relle * defined model metadata for relle * Add mteb/models/relle_models.py * Update mteb/models/relle_models.py Co-authored-by: Roman Solomatin <[email protected]> * lint after commit run after "make lint" * Add into model_modules Add model into model_modules and lint check --------- Co-authored-by: Roman Solomatin <[email protected]> * Backfill task metadata for metadata for GermanDPR and GermanQuAD (#2566) * Add metadata for GermanDPR and GermanQuAD * PR improvements * Update tasks table * Add ModelMeta for CodeSearch-ModernBERT-Crow-Plus (#2570) * Add files via upload * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update overview.py * Update shuu_model.py * Update shuu_model.py * Update shuu_model.py * Update mteb/models/shuu_model.py Co-authored-by: Roman Solomatin <[email protected]> --------- Co-authored-by: Roman Solomatin <[email protected]> * Docs: Improve MIEB docs (#2569) * Add missing annotations (#2498) * Update tasks table * move icon & name to benchmark dataclass (#2573) * Remove the comments from ImageEncoder (#2579) * fix: Add Encodechka benchmark (#2561) * add tasks * add benchmark * fix imports * update stsb split * Update tasks table * 1.38.2 Automatically generated by python-semantic-release * fix FlagEmbedding package name (#2588) * fix codecarbon version (#2587) * Add MIEB image only benchmark (#2590) * add vision only bench * add description * correct zs task modalities * specify tasks param * Add image only MIEB benchmark to LB left panel (#2596) * Update benchmarks.py * make lint * add to left side bar * update Doubao-1.5-Embedding (#2575) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <[email protected]> * update logging * update lint --------- Co-authored-by: zhangpeitian <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * fix: Add WebSSL models (#2604) * add 2 web SSL dino models * add models from collection and revisions * update memory_usage_mb and embed dim * use automodel instead * fix mieb citation (#2606) * 1.38.3 Automatically generated by python-semantic-release * Update Doubao-1.5-Embedding (#2611) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <[email protected]> * update logging * update lint * update link --------- Co-authored-by: zhangpeitian <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * CI: update benchmark table (#2609) * update benchmark table * fix table * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update Doubao-1.5-Embedding revision (#2613) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <[email protected]> * update logging * update lint * update link * update revision --------- Co-authored-by: zhangpeitian <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * CI: fix table (#2615) * Update tasks & benchmarks tables * Update gradio version (#2558) * Update gradio version Closes #2557 * bump gradio * fix: Removed missing dataset for MTEB(Multilingual) and bumped version We should probably just have done this earlier to ensure that the multilingual benchamrk is runable. * CI: fix infinitely committing issue (#2616) * fix token * try to trigger * add token * test ci * Update tasks & benchmarks tables * Update tasks & benchmarks tables * remove test lines --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Add ScandiSent dataset (#2620) * add scandisent dataset * add to init * typo * lint * 1.38.4 Automatically generated by python-semantic-release * Format all citations (#2614) * Fix errors in bibtex_citation * Format all bibtex_citation fields * format benchmarks * fix format * Fix tests * add formatting script * fix citations (#2628) * Add Talemaader pair classification task (#2621) Add talemaader pair classification task * fix citations * fix citations --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> Co-authored-by: Kenneth Enevoldsen <[email protected]> Co-authored-by: Uri K <[email protected]> Co-authored-by: chenghao xiao <[email protected]> Co-authored-by: Munot Ayush Sunil <[email protected]> Co-authored-by: OnandOn <[email protected]> Co-authored-by: richinfo-ai <[email protected]> Co-authored-by: Niklas Muennighoff <[email protected]> Co-authored-by: Adewole Babatunde <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> Co-authored-by: ahxgw <[email protected]> Co-authored-by: kunka.xgw <[email protected]> Co-authored-by: Sam Heymann <[email protected]> Co-authored-by: sam021313 <[email protected]> Co-authored-by: Nadia Sheikh <[email protected]> Co-authored-by: theatollersrud <[email protected]> Co-authored-by: hongst <[email protected]> Co-authored-by: E. Tolga Ayan <[email protected]> Co-authored-by: lllsy12138 <[email protected]> Co-authored-by: shyuli <[email protected]> Co-authored-by: Siddharth M. Bhatia <[email protected]> Co-authored-by: Bao Loc Pham <[email protected]> Co-authored-by: Flo <[email protected]> Co-authored-by: Florian Rottach <[email protected]> Co-authored-by: Alexey Vatolin <[email protected]> Co-authored-by: Olesksii Horchynskyi <[email protected]> Co-authored-by: Pandaswag <[email protected]> Co-authored-by: Márton Kardos <[email protected]> Co-authored-by: Mehrzad Shahin-Moghadam <[email protected]> Co-authored-by: Mehrzad Shahin-Moghadam <[email protected]> Co-authored-by: Youngjoon Jang <[email protected]> Co-authored-by: 24September <[email protected]> Co-authored-by: Jan Karaś <[email protected]> Co-authored-by: Shuu <[email protected]> Co-authored-by: namespace-Pt <[email protected]> Co-authored-by: zhangpeitian <[email protected]> Co-authored-by: Imene Kerboua <[email protected]>

@ayush1298

* move icon & name to benchmark dataclass (#2573) * Remove the comments from ImageEncoder (#2579) * fix: Add Encodechka benchmark (#2561) * add tasks * add benchmark * fix imports * update stsb split * Update tasks table * 1.38.2 Automatically generated by python-semantic-release * fix FlagEmbedding package name (#2588) * fix codecarbon version (#2587) * Add MIEB image only benchmark (#2590) * add vision only bench * add description * correct zs task modalities * specify tasks param * Add image only MIEB benchmark to LB left panel (#2596) * Update benchmarks.py * make lint * add to left side bar * update Doubao-1.5-Embedding (#2575) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <[email protected]> * update logging * update lint --------- Co-authored-by: zhangpeitian <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * fix: Add WebSSL models (#2604) * add 2 web SSL dino models * add models from collection and revisions * update memory_usage_mb and embed dim * use automodel instead * fix mieb citation (#2606) * 1.38.3 Automatically generated by python-semantic-release * Update Doubao-1.5-Embedding (#2611) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <[email protected]> * update logging * update lint * update link --------- Co-authored-by: zhangpeitian <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * CI: update benchmark table (#2609) * update benchmark table * fix table * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update Doubao-1.5-Embedding revision (#2613) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <[email protected]> * update logging * update lint * update link * update revision --------- Co-authored-by: zhangpeitian <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * Update tasks & benchmarks tables * CI: fix table (#2615) * Update tasks & benchmarks tables * Update gradio version (#2558) * Update gradio version Closes #2557 * bump gradio * fix: Removed missing dataset for MTEB(Multilingual) and bumped version We should probably just have done this earlier to ensure that the multilingual benchamrk is runable. * CI: fix infinitely committing issue (#2616) * fix token * try to trigger * add token * test ci * Update tasks & benchmarks tables * Update tasks & benchmarks tables * remove test lines --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Add ScandiSent dataset (#2620) * add scandisent dataset * add to init * typo * lint * 1.38.4 Automatically generated by python-semantic-release * Format all citations (#2614) * Fix errors in bibtex_citation * Format all bibtex_citation fields * format benchmarks * fix format * Fix tests * add formatting script * fix citations (#2628) * Add Talemaader pair classification task (#2621) Add talemaader pair classification task * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency (#2633) * add Bilingual English-Danish parallel corpus from The Danish Medicines Agency * bump dataset revision * format bibtex * format bibtex * Remove irrelevant test (#2630) remove irrelevant test * Revert "CI: fix infinitely committing issue (#2616)" (#2636) This reverts commit 82dcb3d. * Update tasks & benchmarks tables * Remove `typer` dependency from citation script (#2629) remove typer dependency from citation script * CI format citations (#2649) * ci format citations * add files * remove from lint CI * test lint * test lint * fix names * fix: Update VisualSTS Aggregate task modalities (#2597) * Update STS17MultilingualVisualSTS.py * fix STSBenchmarkMultilingualVisualSTS --------- Co-authored-by: Isaac Chung <[email protected]> * 1.38.5 Automatically generated by python-semantic-release * Add tests for leaderboard build (#2631) * Add tests for leaderboard build * add new action * remove build tests from other actions * fix tests * correct exclusion of test * added timeout constant * fix: SIB200 machine translated > human translated (#2665) As correctly pointed out in: https://huggingface.co/datasets/mteb/sib200/discussions/1 * 1.38.6 Automatically generated by python-semantic-release * fix: Update datasets wich can't be loaded with `datasets>=3.0` (#2661) fix: Update datasets wich can't be loaded with `datasets>=3.0` (#1619) * reupload datasets * fix loader * remove commented code * lint * update pyproject dependencies * rename model RELLE to CHAIN19 (#2671) * Add relle * defined model metadata for relle * Add mteb/models/relle_models.py * Update mteb/models/relle_models.py Co-authored-by: Roman Solomatin <[email protected]> * lint after commit run after "make lint" * Add into model_modules Add model into model_modules and lint check * rename model change model name * rename model change model name --------- Co-authored-by: Roman Solomatin <[email protected]> * 1.38.7 Automatically generated by python-semantic-release * Update final version of Doubao-1.5-Embedding (Rename to Seed1.5-Embedding) (#2674) * update seed-embedding * update seed models * fix linting and tiktoken problem * fix tiktoken bug * fix lint * update name * Update mteb/models/seed_models.py adopt suggestion Co-authored-by: Roman Solomatin <[email protected]> * update logging * update lint * update link * update revision * update Doubao-1.5-Embedding revision 3 * rename Doubao-1.5-Embedding to Seed1.5-Embedding --------- Co-authored-by: zhangpeitian <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * fix: Allow empty string for openai models (#2676) * fix for empty string input to openai/text-embedding-3-large * fix: Allow empty string in openai models closes: #1650 * fix based on review * Updated docstring --------- Co-authored-by: ayush1298 <[email protected]> * 1.38.8 Automatically generated by python-semantic-release * Leaderboard: UI simplifications for menus (#2672) * Leaderboard: UI simplifications for menus Did a few things to improve the simplify the leaderboard UI. Changes: - Combined FAQ entries - Created dropdowns in the select benchmark menu sidebar - Removed reference to arena - Removed reference to old leaderboard - reduced size of select menu - reduced the size of acknowledgements - removed farsi from the selection (as it is a beta) refactors: - refactored to use a class for menu items - refactored texts segments out of app.py * fixed comment * fixes for sizes * fix modality for `OVENIT2TRetrieval` (#2678) fix modality * fix: `MTEB(Code, v1)` languages (#2679) fix code languages * 1.38.9 Automatically generated by python-semantic-release * Correction in docs (#2688) * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * Fix for Openai_Text-Embedding3-Small (#2702) * Fix for Openai_Text-Embedding3-Small * better syntax for readability * fix: Ensure that optional dependencies are compatible and if not state it (#2706) Fixes mistakes introduced in #2424 It seems like many of these requirements doesn't exist (voyageai>=1.0.0). @ayush1298 I am hoping you could clear up how this happened? * fix: Only install mteb into site packages (#2618) * Restrict installation directory * fix * namespace false * add star * add pont * fix import * fix import * add init files * fix setuptools find * fix image init * add missing templates --------- Co-authored-by: Roman Solomatin <[email protected]> * 1.38.10 Automatically generated by python-semantic-release * docs: Updated the PR template and improved submission docs (#2704) * docs: Updated the PR template and improved submission docs 1) Updated PR template to only include checklist for datasets and models. The other checklists were essentially just tests. 2) I have updated the documentation for adding models. Notably I have split out the implementation segment, which I think makes it more readable. 3) Required that you argue for a dataset before addition fixes #2568 * Apply suggestions from code review Co-authored-by: Isaac Chung <[email protected]> --------- Co-authored-by: Isaac Chung <[email protected]> * fix: Remove models from the leaderboard (#2705) * fix: Remove models from the leaderboard I remove both models from the leaderboard by unlinking them from the import tree. I think this is the easiest way to add a model that not currently public. * format * 1.38.11 Automatically generated by python-semantic-release * fix: Rename gemini-embedding-exp-03-07 to gemini-embedding-001 (#2711) * Rename gemini-embedding-exp-03-07 to gemini-embedding-001 * update referenfe link to the vertexAI API doc * 1.38.12 Automatically generated by python-semantic-release * fix: Integrate `lightonai/GTE-ModernColBERT-v1` (#2708) * fix: Integrate `lightonai/GTE-ModernColBERT-v1` Fixes #2673 * fixes based on corrections * 1.38.13 Automatically generated by python-semantic-release * docs: fix number of tasks for eng, v2 in docs (#2720) * fix: Added potion-multilingual-128M (#2717) * Added ModelMeta for potion-multilingual-128M * Fixed linting * Fixed linting * Updated date * 1.38.14 Automatically generated by python-semantic-release * Update the max tokens for gemini-embedding-001 (#2725) * fix: Ara and ben classification dataset cleaning (#2632) * Improve classification datasets quality for ara and ben langs * add missing AJGT * fix format * change ajgt description * Fix numbers in description, add link to pull request * Add too short filter * Link in markdown format * Update tasks & benchmarks tables * fix: Update Seed1.5-Embedding API (#2724) * update seed1.5-embedding api * update seed1.5-embedding api * update Seed1.5-Embedding API * update Seed1.5-Embedding resolve comments * update Seed1.5-Embedding lint * Update mteb/models/seed_models.py --------- Co-authored-by: Kenneth Enevoldsen <[email protected]> * 1.38.15 Automatically generated by python-semantic-release * fix: Add vidore v2 benchmarks (#2713) * adding vidore benchmarks * fix typo * clean vidore names + per lang eval * lint * vidore names * bibtex fix * fix revision * vidore v2 citation * update citation format and fix per-language mappings * lint: citations * typo citations * Update tasks & benchmarks tables * 1.38.16 Automatically generated by python-semantic-release * fix: `IndicQARetrieval` loader (#2729) * fix indic qa * add kwargs * 1.38.17 Automatically generated by python-semantic-release * fix: Promote Persian benchmark to v1 (#2707) * Switch versioning from beta to v1 and add v1 to benchmark selector * Update Farsi benchmark display name, task IDs, and metadata * Add Hakim Model * fix hakim version * update * make lint * fix: Promote Persian benchmark to v1 --------- Co-authored-by: mehran <[email protected]> Co-authored-by: Kenneth Enevoldsen <[email protected]> * Update tasks & benchmarks tables * 1.38.18 Automatically generated by python-semantic-release * Add ViDoRe combined benchmark and add to leaderboard side panel (#2732) * add ViDoRe combined benchmark and add to leaderboard side panel * Update benchmark_selector.py * Update tasks & benchmarks tables * fix: Rename display name of VDR (#2734) * Update tasks & benchmarks tables * 1.38.19 Automatically generated by python-semantic-release * fix: Add colpali models family (#2721) * add colpali models * add colpali as framework * add colpali as framework * update metadata and add colsmol * ix typos * account for revision * add training data info and lint * modify meta * correct colmodels meta and add colnomic 7b * fix typo in toml (colpali subdeps) * refine colmodel loading and metadata * 1.38.20 Automatically generated by python-semantic-release * fix: Correct embedding dimension for bge-m3 (#2738) Fixes #2735 * 1.38.21 Automatically generated by python-semantic-release * docs: Updated description of FEVER (#2745) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * minor * Backfill task metadata for metadata for BigPatentClustering and AllegroReviews (#2755) * big-patent * allegro-reviews * Update tasks & benchmarks tables * Update Seed1.5 training data (#2749) * update seed1.5 training data * update seed1.5 training data * fix: Update caltech101 (#2759) * docs: Updated description of FEVER Update the description to state that the corpus is the same as fever as we have have [multiple questions on it](https://huggingface.co/datasets/mteb/climate-fever/discussions/2) * fix: Update Caltech101 to different source Run both versions of one of the task using `nomic-ai/nomic-embed-text-v1.5` and both scores match: ### Old ``` { "dataset_revision": "851374102055782c84f89b1b4e9d128a6568847b", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897863, ``` ### New ``` { "dataset_revision": "52439cf6d4f6ebf563d8cdc7f2c5371d9efd2686", "task_name": "Caltech101", "mteb_version": "1.38.4", "scores": { "test": [ { "accuracy": 0.897929, ``` * 1.38.22 Automatically generated by python-semantic-release * Add missing PatchCamelyon_labels.txt (#2756) * ci: Delete cache in Model loading test only when model is loaded (#2761) * only delete cache when model loaded * testing it out * fix: Add `cadet-embed-base-v1` (#2727) * update * update overview.py for models * update * update * 1.38.23 Automatically generated by python-semantic-release * Fixing Google embedding task type for STS (#2767) The type `SIMILARITY` is invalid. Correct one: `SEMANTIC_SIMILARITY`. See https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types#supported_task_types * docs: Leaderboard simplifications (#2764) * docs: Leaderboard simplifications Simplified sidebar, notably: 1) Combined Language and Regional (since these are all languages) 2) Folded all (With Visual document retrieval then images start to take up a lot of space) 3) Removed legacy and instead added "Other" in language, where I moved "English Legacy" I also restructured the code so that nesting is easier. Is it also possible to create a seperate section (see dummy screenshot) * refactor to reduce nesting * format * fix: add xet support (#2603) * add xet version * add doc comment * change xet requirements * Update docs/usage/usage.md --------- Co-authored-by: Kenneth Enevoldsen <[email protected]> * 1.38.24 Automatically generated by python-semantic-release * fix: Update giga embeddings (#2774) * update giga embeddings * update giga embeddings --------- Co-authored-by: Kolodin Egor <[email protected]> * ci: add new prefixes to releases (#2766) add new prefixes * 1.38.25 Automatically generated by python-semantic-release * fix: Update Caltech101 datasets to latest revision [v1] (#2778) * fix: Update Caltech101 datasets to latest revision [v2] fixes: #2770 Fixes the issue, but only in v1 ``` # tested using: task: mteb.AbsTask = mteb.get_task("Caltech101ZeroShot") task.load_data() task.get_candidate_labels() ``` * fix rev * 1.38.26 Automatically generated by python-semantic-release * fix: CachedEmbeddingWrapper issues in both documentation and code (#2779) Fixes #2772 * 1.38.27 Automatically generated by python-semantic-release * dataset: Add miracl vision (#2736) * add miracl vision * add miracl vision * ruff * cast * image * image * add langs * add langs * add langs * add langs * descriptive stats * lint * lint * lint * remove com * Update tasks & benchmarks tables * model: Add Qwen3 Embedding model (#2769) * Init code * Remove extra config and lint code * use sentence transformer * add revisions * fix lint * Apply suggestions from code review Co-authored-by: Roman Solomatin <[email protected]> * fix lint * add framework --------- Co-authored-by: Roman Solomatin <[email protected]> * bump ruff (#2784) * Update issue and pr templates (#2782) * Update issue templates * Update bug_report.md * test yaml template * add templates * update templates * add emojis * fix typo * Apply suggestions from code review Co-authored-by: Kenneth Enevoldsen <[email protected]> * update issue titles * update PR template * remove PR templates --------- Co-authored-by: Kenneth Enevoldsen <[email protected]> * model: Add GeoGPT-Research-Project/GeoEmbedding (#2773) * add model: geogpt_models * update geogpt_models * use InstructSentenceTransformerWrapper * resolve pylint warning * format geogpt_models.py * Update mteb/models/geogpt_models.py Co-authored-by: Roman Solomatin <[email protected]> * Update mteb/models/geogpt_models.py --------- Co-authored-by: zhangzeqing <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> Co-authored-by: Kenneth Enevoldsen <[email protected]> * model: add fangxq/XYZ-embedding (#2741) * add xyz model * add xyz model * add xyz model * update * update * update * update * update * update * update * lint --------- Co-authored-by: Roman Solomatin <[email protected]> Co-authored-by: Kenneth Enevoldsen <[email protected]> * ci: fix config error for semantic release (#2800) discussed in: #2796 * dataset: Add R2MED Benchmark (#2795) * Add files via upload * Add files via upload * Update benchmarks.py * Update __init__.py * Add files via upload * Update R2MEDRetrieval.py * Update run_mteb_r2med.py * Delete scripts/run_mteb_r2med.py * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <[email protected]> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <[email protected]> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <[email protected]> * Update mteb/tasks/Retrieval/eng/R2MEDRetrieval.py Co-authored-by: Roman Solomatin <[email protected]> * Add files via upload * Delete mteb/descriptive_stats/Retrieval/R2MEDRetrieval.json * Add files via upload * Add files via upload * Add files via upload * Update R2MEDRetrieval.py * Add files via upload * Add files via upload * Add files via upload * Add files via upload * format citations * Update R2MEDRetrieval.py * Add files via upload * Add files via upload --------- Co-authored-by: Li Lei <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * Update tasks & benchmarks tables * Update training datasets of GeoGPT-Research-Project/GeoEmbedding (#2802) update training datasets Co-authored-by: zhangzeqing <[email protected]> * fix: Add adapted_from to Cmedqaretrieval (#2806) * fix: Add adapted_from to Cmedqaretrieval Also snuck in a fix with form=None, which is no longer valid, but was still used in a few places. * format * 1.38.28 Automatically generated by python-semantic-release * fix: Adding client arg to init method of OpenAI models wrapper (#2803) * Adding OpenAI client arg to init method (e.g., for already initialized AzureOpenAI client) To use OpenAI embedding models via Azure, the model wrapper needs to be initialized with a different client. * Update mteb/models/openai_models.py Co-authored-by: Roman Solomatin <[email protected]> * Update mteb/models/openai_models.py * remove comment and format --------- Co-authored-by: Kenneth Enevoldsen <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * model: Add annamodels/LGAI-Embedding-Preview (#2810) Add LGAI-Embedding - Add mteb/models/lgai_embedding_models.py - defined model metadata * fix: Ensure bright uses the correct revision (#2812) fixes #2811 * 1.38.29 Automatically generated by python-semantic-release * add description to issue template (#2817) * add description to template * fix typo * model: Added 3 HIT-TMG's KaLM-embedding models (#2478) * Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper * Added KaLM_embedding_multilingual_mini_instruct_v1_5 * Added model to overview.py * Fix Task Count Per Language Table in tasks.md * resolve conflicts * remove tasks.md * Modified get_instruction funcion * Added support for prompt dict in get_instruction * fix lang code * Address comments * Delete mteb/models/check_models.py * added prompts_dict support in InstructSentenceTransformerWrapper * corrected instruction format * corrected prompts format * added correct instruction format * fix implementation * remove `if name main` * add comment --------- Co-authored-by: Roman Solomatin <[email protected]> * fix: Reuploaded previously unavailable SNL datasets (#2819) * fix: Reuploaded previously unavailable SNL datasets closes #2477 * removed exceptions from tests * temp fixes * added temporary fix * clean up commented out code * format * Update tasks & benchmarks tables * 1.38.30 Automatically generated by python-semantic-release * docs: Fix some typos in `docs/usage/usage.md` (#2835) * Update usage.md * Update usage.md * Update docs/usage/usage.md --------- Co-authored-by: Isaac Chung <[email protected]> * model: Add custom instructions for GigaEmbeddings (#2836) * add custom instructions * fixed * lint * fix last instruction --------- Co-authored-by: Kolodin Egor <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> * try adding init * add init in audio pc task eng * all audio tasks init * remove script test --------- Co-authored-by: Roman Solomatin <[email protected]> Co-authored-by: Kenneth Enevoldsen <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <[email protected]> Co-authored-by: namespace-Pt <[email protected]> Co-authored-by: zhangpeitian <[email protected]> Co-authored-by: Alexey Vatolin <[email protected]> Co-authored-by: Imene Kerboua <[email protected]> Co-authored-by: Ömer Veysel Çağatan <[email protected]> Co-authored-by: Munot Ayush Sunil <[email protected]> Co-authored-by: 24September <[email protected]> Co-authored-by: wang.yuqi <[email protected]> Co-authored-by: Roman Solomatin <[email protected]> Co-authored-by: Feiyang <[email protected]> Co-authored-by: Thomas van Dongen <[email protected]> Co-authored-by: Paul Teiletche <[email protected]> Co-authored-by: Mehran Sarmadi <[email protected]> Co-authored-by: mehran <[email protected]> Co-authored-by: Dawid Koterwas <[email protected]> Co-authored-by: Wentao Wu <[email protected]> Co-authored-by: Manveer Tamber <[email protected]> Co-authored-by: malteos <[email protected]> Co-authored-by: Egor <[email protected]> Co-authored-by: Kolodin Egor <[email protected]> Co-authored-by: Manuel Faysse <[email protected]> Co-authored-by: Xin Zhang <[email protected]> Co-authored-by: Hypothesis-Z <[email protected]> Co-authored-by: zhangzeqing <[email protected]> Co-authored-by: fangxiaoquan <[email protected]> Co-authored-by: Li Lei <[email protected]> Co-authored-by: annamodels <[email protected]> Co-authored-by: Sadra Barikbin <[email protected]>

Samoed added 2 commits April 19, 2025 18:18

add tasks

55e71b3

add benchmark

7ac59c5

Samoed added the new benchmark Issues related to adding a new benchmark label Apr 19, 2025

fix imports

a90ef9e

Samoed mentioned this pull request Apr 20, 2025

encodechka results embeddings-benchmark/results#182

Merged

3 tasks

update stsb split

af5ad23

KennethEnevoldsen approved these changes Apr 26, 2025

View reviewed changes

KennethEnevoldsen changed the title ~~add Encodechka benchmark~~ fix: Add Encodechka benchmark Apr 26, 2025

Merge branch 'main' into encodechka

a6a4d2d

# Conflicts: # mteb/benchmarks/benchmarks.py # mteb/leaderboard/benchmark_selector.py

Samoed enabled auto-merge (squash) April 27, 2025 17:28

Samoed disabled auto-merge April 27, 2025 17:44

Samoed enabled auto-merge (squash) April 27, 2025 17:47

Samoed merged commit 0737e78 into main Apr 27, 2025
8 checks passed

Samoed deleted the encodechka branch April 27, 2025 17:54

isaac-chung reviewed May 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Add Encodechka benchmark #2561

fix: Add Encodechka benchmark #2561

Uh oh!

Samoed commented Apr 19, 2025 •

edited

Loading

Uh oh!

KennethEnevoldsen left a comment

Uh oh!

Uh oh!

isaac-chung May 1, 2025

Uh oh!

Samoed May 1, 2025

Uh oh!

Uh oh!

fix: Add Encodechka benchmark #2561

fix: Add Encodechka benchmark #2561

Uh oh!

Conversation

Samoed commented Apr 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Quality

Documentation

Testing

Adding datasets checklist

Adding a model checklist

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

isaac-chung May 1, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed May 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Samoed commented Apr 19, 2025 •

edited

Loading