NLP pipeline for extracting structured data from the Colonial Architecture collection (by Gossa Lo) made in the context of the ArchiMediaL project
European colonialism has left its marks in many countries around the world. Traces of this heritage can still be found today in the infrastructure, planning and architecture in former colonies. Documents and images have since been collected and are stored in the online Colonial Architecture repository (http://ColonialArchitecture.eu). This paper investigates a possible contribution of computational linguistic and Linked Data techniques on the annotation and formalization of these documents, by means of a Python pipeline. We finally validate its usefulness by testing the pipeline on a subset of the Colonial Architecture corpus