PlotQA is a VQA dataset with 28.9 million question-answer pairs grounded over 224,377 plots on data from real-world sources and questions based on crowd-sourced question templates.
Existing synthetic datasets (FigureQA, DVQA) for reasoning over plots do not contain variability in data labels, real-valued data, or complex reasoning questions. Consequently, proposed models for these datasets do not fully address the challenge of reasoning over plots. In particular, they assume that the answer comes either from a small fixed size vocabulary or from a bounding box within the image. However, in practice this is an unrealistic assumption because many questions require reasoning and thus have real valued answers which appear neither in a small fixed size vocabulary nor in the image. In this work, we aim to bridge this gap between existing datasets and real world plots by introducing PlotQA. Further, 80.76% of the out-of-vocabulary (OOV) questions in PlotQA have answers that are not in a fixed vocabulary. PlotQA paper (WACV 2020)
Download the dataset of 28.9 million question-answer pairs grounded over 224,377 plots.
Our proposed pipeline (VOES) consists of various subtasks: (i) detect all the elements in the plot (bars, legend names, tick labels, etc), (ii) reads the values of these elements, (iii) establish relationship between the plot elements, and (iv) reason over this structured data.
Ask us questions at nmethani@cse.iitm.ac.in and prithag@cse.iitm.ac.in.
Thank you SQuAD for allowing us to use the code to create this website.
To assess the difficulty of the PlotQA dataset, we report human accuracy on a small subset of the Test split of the dataset. We also evaluate three state-of-the-art models on PlotQA and observe that uur proposed hybrid model significantly outperforms the existing models. It has an aggregate accuracy of 22.52% on the PlotQA dataset. We acknowledge that the accuracy is significantly lower than human performance. This establishes that the dataset is challenging and raises open questions on models for visual reasoning.
Rank | Model | Accuracy |
---|---|---|
Human Baseline IIT Madras | 80.47 | |
1 March, 2020 | Hybrid Model IIT Madras (Methani & al. '20) | 22.52 |
2 March, 2020 | VOES IIT Madras (Methani & al. '20) | 18.46 |
3 March, 2020 | SAN Carnegie Mellon University (Yang & al., '16) | 7.76 |
We evaluate our model on the test-set of DVQA. Our proposed hybrid model performs better than the existing models (SAN and SANY-OCR) establishing a new state-of-the-art result on DVQA.
Rank | Model | Accuracy |
---|---|---|
1 March, 2020 | Hybrid Model IIT Madras (Methani & al. '20) | 57.99 |
2 March, 2020 | SANDY-OCR Rochester Institute of Technology (Kafle et al., 2018) | 45.77 |
3 March, 2020 | SAN Carnegie Mellon University (Yang & al., '16) | 32.1 |