PlotQA

Reasoning over Scientific Plots

What is PlotQA?

PlotQA is a VQA dataset with 28.9 million question-answer pairs grounded over 224,377 plots on data from real-world sources and questions based on crowd-sourced question templates.

Why PlotQA?

Existing synthetic datasets (FigureQA, DVQA) for reasoning over plots do not contain variability in data labels, real-valued data, or complex reasoning questions. Consequently, proposed models for these datasets do not fully address the challenge of reasoning over plots. In particular, they assume that the answer comes either from a small fixed size vocabulary or from a bounding box within the image. However, in practice this is an unrealistic assumption because many questions require reasoning and thus have real valued answers which appear neither in a small fixed size vocabulary nor in the image. In this work, we aim to bridge this gap between existing datasets and real world plots by introducing PlotQA. Further, 80.76% of the out-of-vocabulary (OOV) questions in PlotQA have answers that are not in a fixed vocabulary. PlotQA paper (WACV 2020)


Getting Started

Download the dataset of 28.9 million question-answer pairs grounded over 224,377 plots.


PlotQA Pipeline

Our proposed pipeline (VOES) consists of various subtasks: (i) detect all the elements in the plot (bars, legend names, tick labels, etc), (ii) reads the values of these elements, (iii) establish relationship between the plot elements, and (iv) reason over this structured data.

Have Questions?

Ask us questions at nmethani@cse.iitm.ac.in and prithag@cse.iitm.ac.in.

Acknowledgements

Thank you SQuAD for allowing us to use the code to create this website.

Star

Results (trained and tested on PlotQA)

To assess the difficulty of the PlotQA dataset, we report human accuracy on a small subset of the Test split of the dataset. We also evaluate three state-of-the-art models on PlotQA and observe that uur proposed hybrid model significantly outperforms the existing models. It has an aggregate accuracy of 22.52% on the PlotQA dataset. We acknowledge that the accuracy is significantly lower than human performance. This establishes that the dataset is challenging and raises open questions on models for visual reasoning.

RankModelAccuracy

Human Baseline

IIT Madras

80.47

1

March, 2020
Hybrid Model

IIT Madras

(Methani & al. '20)
22.52

2

March, 2020
VOES

IIT Madras

(Methani & al. '20)
18.46

3

March, 2020
SAN

Carnegie Mellon University

(Yang & al., '16)
7.76

Results (trained and tested on DVQA)

We evaluate our model on the test-set of DVQA. Our proposed hybrid model performs better than the existing models (SAN and SANY-OCR) establishing a new state-of-the-art result on DVQA.

RankModelAccuracy

1

March, 2020
Hybrid Model

IIT Madras

(Methani & al. '20)
57.99

2

March, 2020
SANDY-OCR

Rochester Institute of Technology

(Kafle et al., 2018)
45.77

3

March, 2020
SAN

Carnegie Mellon University

(Yang & al., '16)
32.1