Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
-
Updated
Jul 22, 2025 - Python
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
Add a description, image, and links to the visual-chain-of-thought topic page so that developers can more easily learn about it.
To associate your repository with the visual-chain-of-thought topic, visit your repo's landing page and select "manage topics."