/LwM_SIGSPATIAL2020_ToponymMatching

Repository for code underlying the paper 'A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching'.

Primary LanguageJupyter NotebookOtherNOASSERTION

A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

License

This repository provides underlying code and materials for the paper A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching.

Table of contents

Installation

Please follow the instructions on the installation section of DeezyMatch to set up a Python environment and install all the required packages to run DeezyMatch.

Once working Python and DeezyMatch environments are available, the following additional libraries need to be installed:

pip install spacy
pip install geopy
pip install pandarallel
pip install python-Levenshtein
pip install pyxDamerauLevenshtein
pip install haversine
pip install mysql-connector-python

Data directory and structure

In our code, we assume the following directory structure:

LwM_SIGSPATIAL2020_ToponymMatching/
├── datasets
│   ├── candidate_mentions_sets
│   ├── candidate_ranking_datasets
│   ├── gazetteers
│   ├── query_mentions_sets
│   └── toponym_matching_datasets
├── experiments
│   ├── inputs
│   │   ├── characters_v001.vocab
│   │   └── dataset-string-similarity_test.txt
│   ├── levdam_results
│   ├── mapped_results
│   ├── models
│   └── ranker_results
└── processing
    ├── candidate_ranking_datasets
    ├── candselection
    ├── gazetteers
    ├── toponym_matching_datasets
    └── resources

Description of main directories:

  • processing/: contains code for preparing or generating the different datasets.
  • datasets/: contains datasets used in the experiments, resulting from running the processing/ codes.
  • experiments/: contains the experiment codes and generated files.

The experiments/ folder contains two notebooks with the experiments reported in the paper:

(The results presented in this paper were generated by DeezyMatch v1.2.0 (Released: Sep 15, 2020).)

⚠️ Make sure you have gone through the required processing steps (described here) and that you have all the data needed before you run the experiments.

Citation

If you use or adapt this code in your paper, please cite:

Mariona Coll Ardanuy, Kasra Hosseini, Katherine McDonough, Amrey Krause, Daniel van Strien, and Federico Nanni. "A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching." In Proceedings of the 28th International Conference on Advances in Geographic Information Systems (SIGSPATIAL): Poster papers, pp. 385-388. 2020.
@inproceedings{collardanuy2020sigspatial,
  title={A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching},
  author={Coll Ardanuy, Mariona and Hosseini, Kasra and McDonough, Katherine and Krause, Amrey and van Strien, Daniel and Nanni, Federico},
  booktitle={Proceedings of the 28th International Conference on Advances in Geographic Information Systems (SIGSPATIAL): Poster papers},
  pages={385--388},
  year={2020}
}

A longer version of the article is available on arXiv:

Mariona Coll Ardanuy, Kasra Hosseini, Katherine McDonough, Amrey Krause, Daniel van Strien, and Federico Nanni. "A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching." arxiv:2009.08114. 2020.
@article{collardanuy2020geocandidateArxiv,
  title={A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching},
  author={Coll Ardanuy, Mariona and Hosseini, Kasra and McDonough, Katherine and Krause, Amrey and van Strien, Daniel and Nanni, Federico},
  journal={arXiv e-prints},
  pages={arxiv:2009.08114},
  year={2020}
}

Future work and contributing

The authors of the paper plan to further develop the codes and extend the experiments. We welcome pull requests for improvements and issues if you encounter any errors.

Get in touch

Contacts of the corresponding authors:

  • Mariona Coll Ardanuy, mcollardanuy[at]turing.ac.uk
  • Kasra Hosseini, khosseini[at]turing.ac.uk
  • Federico Nanni, fnanni[at]turing.ac.uk

Acknowledgements

Work for this paper was produced as part of Living with Machines. This project, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London. This work was also supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1. Newspaper data was kindly shared by Findmypast.

License

  • The source codes are licensed under MIT License.
  • Copyright (c) 2020 The Alan Turing Institute, British Library Board, Queen Mary University of London, University of Exeter, University of East Anglia and University of Cambridge.
  • The datasets hosted on zenodo are licensed under CC-BY-4.0.