Ten Thousand German News Articles Dataset
For more information visit the detailed project page.
- Install the required python packages
pip install -r requirements.txt
. - Download the
corpus.sqlite3
file into the project root from here (compressed) or directly from here. - Run
python code/extract_dataset_from_sqlite.py corpus.sqlite3 articles.csv
to extract the articles. - Run
python code/split_articles_into_train_test.py
to split the dataset.
License
All code in this repository is licensed under a MIT License.
The dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.