/EarthVQA

[AAAI 2024] EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering

Primary LanguagePython

EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering

by Junjue Wang, Zhuo Zheng, Zihang Chen, Ailong Ma, and Yanfei Zhong

[Paper], [Video], [Dataset], [Leaderboard-SEG], [Leaderboard-VQA]

News

  • 2024/05/12, Code and Pre-trained weights have been updated.

  • 2024/05/11, EarthVQA dataset has been released.

Requirements:

  • pytorch >= 1.1.0
  • python >=3.6

Install Ever + Segmentation Models PyTorch

pip install ever-beta
pip install git+https://github.com/qubvel/segmentation_models.pytorch

Data preparation

  • Download EarthVQA dataset and pre-trained weights
  • Construct the data as follows:
EarthVQA
├── Train
│   ├── images_png
│   ├── masks_png
├── Val
│   ├── images_png
│   ├── masks_png
├── Test
│   ├── images_png
├── Train_QA.json
├── Val_QA.json
├── Test_QA.json
├── log
|   |—— sfpnr50.pth
│   ├── soba.pth

Note that the images are the same as the LoveDA dataset, so the urban and rural areas can be divided on your own.

Test

# 1. generate semantic masks use the pre-trained SFPN weight
sh ./scripts/generate_segfeats.sh
# 2. generate answers use the pre-trained SOBA weight
sh ./scripts/predict_soba.sh

Train

# 1 train a segmentation model
sh ./scripts/train_sfpnr50.sh
# 2 generate segmentation features and pse-masks
sh ./scripts/generate_segfeats.sh
# 3 train SOBA
sh ./scripts/train_soba.sh

Citation

If you use EarthVQA in your research, please cite our following papers.

    @article{wang2024earthvqa, 
        title={EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering},
        url={https://ojs.aaai.org/index.php/AAAI/article/view/28357}, 
        DOI={10.1609/ai.v38i6.28357}, 
        author={Junjue Wang and Zhuo Zheng and Zihang Chen and Ailong Ai and Yanfei Zhong}, 
        year={2024}, 
        month={Mar.},
        volume={38},
        pages={5481-5489}}
    @article{earthvqanet,
        title = {EarthVQANet: Multi-task visual question answering for remote sensing image understanding},
        journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
        volume = {212},
        pages = {422-439},
        year = {2024},
        issn = {0924-2716},
        doi = {https://doi.org/10.1016/j.isprsjprs.2024.05.001},
        url = {https://www.sciencedirect.com/science/article/pii/S0924271624001990},
        author = {Junjue Wang and Ailong Ma and Zihang Chen and Zhuo Zheng and Yuting Wan and Liangpei Zhang and Yanfei Zhong},
    }

Dataset and Contest

The EarthVQA dataset is released at Google Drive and Baidu Drive

You can develop your models on Train and Validation sets.

Semantic Category labels: background – 1, building – 2, road – 3, water – 4, barren – 5,forest – 6, agriculture – 7, playground - 8. And the no-data regions were assigned 0 which should be ignored. The provided data loader will help you construct your pipeline.

Submit your test results on EarthVQA Semantic Segmentation Challenge, EarthVQA Visual Question Answering Challenge. You will get your Test scores smoothly.

Feel free to design your own models, and we are looking forward to your exciting results!

License

The owners of the data and of the copyright on the data are RSIDEA, Wuhan University. Use of the Google Earth images must respect the "Google Earth" terms of use. All images and their associated annotations in EarthVQA can be used for academic purposes only, but any commercial use is prohibited.

知识共享许可协议