/alb-fake-news-corpus

The First Ever Albanian Fake News Corpus

Albanian Fake News Corpus

Hello! Thank you for using our corpus!

In this repository you may find following informations :

  • full_texts folder, which contains the full texts, as collected from their websites. Inside this folder, there are 2 more folders:

    • fake folder: it contains the collected fake news;
    • true folder: it contains the collected true news;
    • fake-meta-information folder: it contains the metadata information of each fake news;
    • true-meta-information folder: it contains the metadata information of each true news;
    • fake-pos folder: it contains the POS tagged fake news;
    • true-pos folder: it contains the POS tagged true news;

    The files in the fake and true metadata information folders follow the following model (line by line):

       date and time of publication
       facebook link
       web page link
       number of tokens
       number of words without punctuation
       number of words in upper case
       average word length
       number of characters
       number of letters in upper case
       number of verbs
       number of subjuntive 
       number of imperative verbs
       number of nouns
       number of adjectives
       number of adverbs
       number of pronouns
       number of modal verbs (mainly auxiliary verbs)
    

    To find the aligned true and fake news pairs is very simple, as they are equally numbered/named inside their folders.

Finally, this corpus was used in our work Albanian Fake News Detection which is published on The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) and you may find it here.