Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
-
Updated
Apr 24, 2024 - Python
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Bilinear attention networks for visual question answering
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Deep Modular Co-Attention Networks for Visual Question Answering
A lightweight, scalable, and general framework for visual question answering research
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Strong baseline for visual question answering
[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Document Visual Question Answering
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part
Official repository for the A-OKVQA dataset
(NeurIPS 2024) Official PyTorch implementation of LOVA3
A pytorch implementation for "A simple neural network module for relational reasoning", working on the CLEVR dataset
CNN+LSTM, Attention based, and MUTAN-based models for Visual Question Answering
Add a description, image, and links to the visual-question-answering topic page so that developers can more easily learn about it.
To associate your repository with the visual-question-answering topic, visit your repo's landing page and select "manage topics."