This repository contains Python code for reproducing the experiments with news headlines in Albanian presented in this paper. AlbNews is a topic modeling corpus of news headlines in Albanian, consisting of 600 topically labeled records and 2600 unlabeled records. Each labeled record includes a headline text and a label 'pol' for politics, 'cul' for culture, 'eco' for economy or 'spo' for sport. More details about the creation and the contents of AlbNews can be found here.
Please download AlbNews corpus and place its files inside the data/ folder. Afterwards, you can run the code of this repository using the following command:
$ python -c <classifier>
If using the AlbNews data or the code of this repository, please cite the following paper:
Erion Çano, Dario Lamaj. AlbNews: A Corpus of Headlines for Topic Modeling in Albanian. CoRR, abs/2402.04028, February 2024. URL:
author = {Erion {\c{C}}ano, Dario Lamaj},
title = {AlbNews: A Corpus of Headlines for Topic Modeling in Albanian},
journal = {CoRR},
volume = {abs/2402.04028},
year = {2024},
url = { },
archivePrefix = {arXiv},
eprint = {2402.04028},