Setup your laptop prior to attending the in-class exercise. You will need docker and optionally download the container images as described in the prerequisite steps
We will launch a local Elasticsearch cluster with dashboard component called Kibana Detailed instructions for that can be found in the ELK folder.
We will also briefly explore MongoDB and Couchbase databases via Docker.
Data pipelines are posted in the data-pipelines folder
We will work with a small subset of the datasets listed here
See the MySQL for a sample
We will also introduce the MongoDB Atlas Free Tier Database Service
https://account.mongodb.com/account/login For detailed setup instructions see README_ATLAS.md
And Streamsets Controlhub Data Pipeline https://cloud.login.streamsets.com/login
For experimenting with Spark we can use Databricks cloud platform.
The offer a free with no-time limit clusters under their community edition offering. The signup process is a little tricky. See the instructions in ./data-pipelines/Spark-Databricks/README.md
https://www.databricks.com/product/faq/community-edition
During the signup process select the community option instead of the 14 day free trial. https://community.cloud.databricks.com/