EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering
by Junjue Wang, Zhuo Zheng, Zihang Chen, Ailong Ma, and Yanfei Zhong
[Paper
],
[Video
],
[Dataset
],
[Leaderboard-SEG
],
[Leaderboard-VQA
]
-
2024/05/12, Code and Pre-trained weights have been updated.
-
2024/05/11, EarthVQA dataset has been released.
- pytorch >= 1.1.0
- python >=3.6
pip install ever-beta
pip install git+https://github.com/qubvel/segmentation_models.pytorch
- Download EarthVQA dataset and pre-trained weights
- Construct the data as follows:
EarthVQA
├── Train
│ ├── images_png
│ ├── masks_png
├── Val
│ ├── images_png
│ ├── masks_png
├── Test
│ ├── images_png
├── Train_QA.json
├── Val_QA.json
├── Test_QA.json
├── log
| |—— sfpnr50.pth
│ ├── soba.pth
Note that the images are the same as the LoveDA dataset, so the urban and rural areas can be divided on your own.
# 1. generate semantic masks use the pre-trained SFPN weight
sh ./scripts/generate_segfeats.sh
# 2. generate answers use the pre-trained SOBA weight
sh ./scripts/predict_soba.sh
# 1 train a segmentation model
sh ./scripts/train_sfpnr50.sh
# 2 generate segmentation features and pse-masks
sh ./scripts/generate_segfeats.sh
# 3 train SOBA
sh ./scripts/train_soba.sh
If you use EarthVQA in your research, please cite our following papers.
@article{wang2024earthvqa,
title={EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering},
url={https://ojs.aaai.org/index.php/AAAI/article/view/28357},
DOI={10.1609/ai.v38i6.28357},
author={Junjue Wang and Zhuo Zheng and Zihang Chen and Ailong Ai and Yanfei Zhong},
year={2024},
month={Mar.},
volume={38},
pages={5481-5489}}
@article{earthvqanet,
title = {EarthVQANet: Multi-task visual question answering for remote sensing image understanding},
journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
volume = {212},
pages = {422-439},
year = {2024},
issn = {0924-2716},
doi = {https://doi.org/10.1016/j.isprsjprs.2024.05.001},
url = {https://www.sciencedirect.com/science/article/pii/S0924271624001990},
author = {Junjue Wang and Ailong Ma and Zihang Chen and Zhuo Zheng and Yuting Wan and Liangpei Zhang and Yanfei Zhong},
}
The EarthVQA dataset is released at Google Drive and Baidu Drive
You can develop your models on Train and Validation sets.
Semantic Category labels: background – 1, building – 2, road – 3, water – 4, barren – 5,forest – 6, agriculture – 7, playground - 8. And the no-data regions were assigned 0 which should be ignored. The provided data loader will help you construct your pipeline.
Submit your test results on EarthVQA Semantic Segmentation Challenge, EarthVQA Visual Question Answering Challenge. You will get your Test scores smoothly.
Feel free to design your own models, and we are looking forward to your exciting results!
The owners of the data and of the copyright on the data are RSIDEA, Wuhan University. Use of the Google Earth images must respect the "Google Earth" terms of use. All images and their associated annotations in EarthVQA can be used for academic purposes only, but any commercial use is prohibited.