-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
During the reproduction process, when I tested qwen-vl and the trained ckpts, I found that the mmvet results were all very low, less than 5. I wonder how the results in the paper were measured. Was the gpt_eval_score used? Also, for mathvista's testmini, the performance of the qwenvl base I tested was not as high as reported. Is it because I used vllm?
Metadata
Metadata
Assignees
Labels
No labels