RDF with Onto.PT and SentiLex-PT data

About

In this repository, a RDF file is provided with the Portuguese connection between words, such as synonyms, hyperonyms and other attributes obtained from the Onto.PT website, which also has information about the confidence level of each triple (which is a value that represents the number of sources that supported that connection between the words, being the maximum value 10).

Some words in the document also contain information about their polarity (which is the same as sentimental value), from the SentiLex-PT dataset. The file "knowledge_database.txt" contains the data from SentiLex-lem-PT and the file "knowledge_database_lem.txt" contains data from SentiLex-flex-PT version, which contains various inflected forms of each lemma.

Onto.PT

The data used was from the triples "Triplos relacionais 10 recursos", and obtains connections related with synonyms, hyperonyms from different 10 sources such as PAPEL, Dicionário Aberto, Wikcionário.PT, TeP, OpenThesaurus.PT, OpenWordNet-PT, PULO, Port4Nooj, Wordnet.Br, ConceptNet. Each line of the source document has a relationship between two lemmas.

SentiLex-PT

SentiLex-PT is a sentiment lexicon designed for the extraction of sentiment and opinion about human entities in Portuguese texts. We use this dataset to complement the information about words, with information about the polarity. For more information about this dataset you can click here.

Contacts

For any question related with this document, or if you need any source file to manipulate the connections, feel free to contact us:

jfalmeida@student.dei.uc.pt

jessicac@student.dei.uc.pt