This repository contains some scripts to download spam emails from http://untroubled.org/spam/ archive and convert them in a CSV suitable for training ML models.
- Install Ruby
- Run
bundle install
- Check if you have
7z
installed. Otherwise install it
There are two actions:
-
ruby actions/fetch_data.rb
to download the latest data. Data is cached indata/
folder -
ruby actions/export_jsonl.rb
to export the JSONLs of the downloaded data inoutput/
folder