/estatNet

Package (Python) for Eurostat online glossaries' web scraping and semantic classification

Primary LanguagePythonEuropean Union Public License 1.1EUPL-1.1

estatnet

Module for Eurostat online glossaries' web scraping and semantic classification

About

This module will enable you to automatically scrape Eurostat online_"Statistics Explained_" and index the contents of these pages into some sort of knowledge graph. It will actually build a graph of inter-relationships between the pages while extracting existing semantic contents (documentation, concepts, glossary, ...).

documentation
status since 2018 – in construction
contributors
license EUPL

Description

Notes

Resources

  • Framework Scrapy for extracting data from online websites.
  • Natural language toolkit nltk to work with human language data.
  • Package NetworkX for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
  • Module py2neo for neo4j graph database, though the bolt driver neo4j-python-driver does the job.

References