/gdelt-downloader

Parallel GDELT data downloader with date filter

Primary LanguagePythonApache License 2.0Apache-2.0

gdelt-downloader

This docker image downloads the GDELT data. You can specify the number of jobs that run in parallel, and the start & end dates via environment variables.

Usage

Download historical data

docker run -i -e njobs=N -e start_date=YYYYMMDD -e end_date=YYYYMMDD -v $(pwd)/data:/app/data yfiua/gdelt-downloader
  • -e njobs=N: Specifies the number of parallel jobs to use. Default is 1.
  • -e start_date=YYYYMMDD: Specifies the start date for the data download in YYYYMMDD format, optional.
  • -e end_date=YYYYMMDD: Specifies the end date for the data download in YYYYMMDD format, optional.
  • -v $(pwd)/data:/app/data: Binds the local data directory to the container's /app/data directory to store the downloaded files.

Streaming data

docker run -d -v $(pwd)/data:/app/data yfiua/gdelt-downloader-streaming

Build yourself

docker build -t gdelt-downloader .

cd streaming
docker build -t gdelt-downloader-streaming .

Changelog

  • 0.2.2
    • Bugfix
  • 0.2.1
    • Do not download the same file when they are moved
  • 0.2
    • Add support for streaming data
  • 0.1.1
    • Less verbose output
    • Use Python 3.12
  • 0.1.0
    • Initial release

Author

yfiua