Code repository to analyse indications of dataset reuse in different communication channels including mailing lists and publications.
dataset_dictionary.csv
file contains a detailed metadata of datasets from datahub.com
dataset_metadata
contains metadata of datasets extracted from datahub.com, separated to different files: basic info, extras, resources, tags and organization.
mailing_lists
contains json dumps of mailing lists: e.g., semantic-web@w3.org (sem-w3.json
) and public-lod@w3.org (lod-w3.json
).
publication_txt
contains textual (.txt) forms of papers from three well known conferences: ISWC, ESWC and WWW. The publication texts should be placed in a folder named Conf+Year
: e.g., ESWC2013
for papers from ESWC 2013 conference).
reuse_counter.py
analyzes dataset reuse (mentions) in different parts of articles.
mailinig_list_counter.py
analyzes dataset reuse (mentions) in mailing lists.