-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Pull requests: EleutherAI/lm-evaluation-harness
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add long-context evaluation benchmarks (LongBench v2, Babilong, InfiniteBench, Phonebook)
#3256
opened Aug 21, 2025 by
Mariani-code
Loading…
Trim thinking content from model output in IFEval
#3240
opened Aug 14, 2025 by
davideguidobene
Loading…
Adding support for evaluating with Mistral and Pixtral models
#3235
opened Aug 13, 2025 by
LearnerSXH
Loading…
Adding support for Structured Generation with XGrammar
#3232
opened Aug 12, 2025 by
ceferisbarov
Loading…
5 tasks
Fix: respect
target_delimiter
when using a gen_prefix
on multiple-choice tasks
#3220
opened Aug 7, 2025 by
karanikolopoulos
Loading…
feat: COT trace response handling in evaluator and model classes
#3204
opened Aug 3, 2025 by
hhh2210
Loading…
Fixed #2552: Improve answer extraction for hendrycks_math
#3192
opened Jul 30, 2025 by
JoonYong-Park
Loading…
Leverage vllm's
tokenizer_info
endpoint to avoid manual duplication
#3185
opened Jul 25, 2025 by
m-misiura
Loading…
Remove generate_until (multiple_target and doc_to_choice indexing) logic in ConfigurableTask.process_results
#3169
opened Jul 21, 2025 by
baberabb
Loading…
Previous Next
ProTip!
Adding no:label will show everything without a label.