/dataflow-cookiecutter

Create production-ready Dataflow projects in a zap! :zap:

Primary LanguagePython

dataflow-cookiecutter Build Status PyPI

Tired of copy-pasting your ad-hoc Dataflow modules? Then you can use this cookiecutter command-line tool to easily generate standardized Dataflow templates!

dataflow-cookiecutter demo

Installation

You can install dataflow-cookiecutter from PyPI:

pip install dataflow-cookiecutter

In addition, you can also clone this repository and install locally:

git clone https://github.com/ljvmiranda921/dataflow-cookiecutter.git
cd dataflow-cookiecutter
python3 setup.py install

Usage

You can create a Dataflow project by executing the command:

$ dataflow-cookiecutter new

Choose from a variety of our premade templates. See all available templates by running dataflow-cookiecutter ls. For example, you can create a Google Cloud Storage (GCS) to BigQuery (BQ) pipeline via:

$ dataflow-cookiecuter new -t GCSToBQ

Lastly, our templates are highly-compatible to your trusty, old cookiecutter command-line tool (be sure to use cookiecutter>=1.7.1!):

$ cookiecutter https://github.com/ljvmiranda921/dataflow-cookiecutter \
   --directory <directory-to-desired-template> 

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given. For more information, please proceed to this link

FAQ

  • Why are you still wrapping cookiecutter? This started as my learning project to see how cookiecutter's internals work. While building the alpha version, I realized that I can add more functionality to this CLI more than templating, so wrapping Cookiecutter seems to be a good approach.
  • I already have cookiecutter, can I use it with your templates? Yes of course! Look at the Usage section above! However, ensure that your cookecutter version is >=1.7.1 so that you can use the --directory flag!
  • Why are you using Python 3 for Dataflow templates? It's 2020, we shouldn't be supporting legacy Python anymore. Besides, Dataflow now has streaming support in Python 3. See more developments for Beam support in Python 3 in their issue tracker.