- apache airflow
- mongodb -> save raw data ->
etl
system - apache spark <-> mongo => pipeline large datasets => control by airflow
- apache druid -> online analyze db,
olap systems
=> use on front end stack ->php backend (cms) <-> react/angular/api <-> apache druid
- Apache Druid is an open-source, distributed database designed for real-time analytics. Apache Druid is optimized for
OLAP
workloads and is designed to query large datasets with low latency. One example of modeling with Apache Druid is to use thedata cube model
to aggregate and query data.The data cube model is a multidimensional array
that can store data from multiple sources and enable fast and efficient querying of the data. Additionally, Druid provides features such as rollup, which can be used to summarize data, and windowing, which can be used to analyze data over a given period of time.
- Apache Druid is an open-source, distributed database designed for real-time analytics. Apache Druid is optimized for
REF:
- https://medium.com/swlh/using-airflow-to-schedule-spark-jobs-811becf3a960
- https://www.mongodb.com/developer/products/mongodb/mongodb-apache-airflow/
- https://medium.com/codex/executing-spark-jobs-with-apache-airflow-3596717bbbe3
- https://www.databricks.com/blog/2015/03/20/using-mongodb-with-spark.html
- https://airflow.apache.org/docs/apache-airflow/stable/start.html
- https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html
- https://www.youtube.com/watch?v=t4h4vsULwFE
- https://airflow.apache.org/docs/apache-airflow/2.3.4/start/docker.html#using-custom-images
- https://airflow.apache.org/docs/docker-stack/build.html
- https://airflow.apache.org/docs/apache-airflow/2.1.1/start/index.html
- https://engineering.tiki.vn/tiki-scales-data-platform-visualization-voi-apache-druid-nhu-the-nao/