Finnish version of the databricks-dolly-15k
instruction dataset
(https://github.com/databrickslabs/dolly/tree/master/data), machine
translated using DeepL (https://www.deepl.com/).
Convert original data from JSONL to DOCX files
python3 jsonl2doc.py original-data/databricks-dolly-15k.jsonl
Translate DOCX files from dolly-doc-in/
using DeepL
(https://www.deepl.com/) and save outputs in dolly-doc-out/
.
Convert back to JSONL
python3 doc2jsonl.py \
original-data/databricks-dolly-15k.jsonl \
dolly-doc-out/dolly-000*.docx \
> dolly-15k-fi.jsonl
This dataset is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC BY-SA).
Note that under the DeepL terms and conditions, this data may not be used to develop, market or train a machine translation algorithm.