/SoMeSci

Software Mentions in Science articles

Creative Commons Attribution 4.0 InternationalCC-BY-4.0

SoMeSci: Software Mentions in Scientific Articles

SoMeSci is a manually labelled corpus of Software Mentions in scientific articles. Overall, it contains 3756 software annotations in 1367 articles. Moreover, additional information (version, extension, release, developer, url, license, citation, abbreviation, alternative name) associated with the software is annotated and linked to the software by relations. For an exact configuration of the annotation take a look in the conf folder. Annotated texts are given as BRAT stand-off format in PLoS_methods, PLoS_sentences, Pubmed_fulltext, and Creation_sentences. All annotations of software, citations, developers, and licenses are also linked through provided unique identifiers in the folder Linking.

For further information on how to work with the data visit SoMeSci_Code and SoMeNLP or have a look at the website including an interactive SPARQL Endpoint: https://data.gesis.org/somesci

CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0