Clone the repo
git clone https://github.com/nagarajuerigi/pyspark_pipeline.git
- Upload the metamodel lookup file and data file to data folder
- Upload notebook init_setup.ipynb, process_csv.ipynb to your work space in Databricks Community Edition to get started
- Create Spark Cluster in Databricks with latest Run-time.
- Run init_setup.ipynb creates the Landing dir, Lookup dir
- Copies the file from Data dir to Lookup & Landing dir
Run process_csv.ipynb to test the flow and add more functionalities and add changes to your feature branch