-
Activate virtualenv - python version >= 3.8.13 run
python3 -m venv virtualenv in the root directory
runsource virtualenv/bin/activate
-
Install requirements run
pip install -r requirements.txt
-
If running this code on a mac os environement => Ventura 13.1, you will need to follow this guide to fix the bug: elastic/elasticsearch#91159
-
You will then need to configure your elasticsearch yml and keystore (sometimes creating the files if applicable)
-
Replace configs.py with your correct credentials
-
Run from the elasticsearch installatin directory : run
brew services start elasticsearch-full
Check if this service has successfully started with runbrew services list
on mac os -
Once the elasticsearch cluster is started, you will need to log in with the username and password you have configured
-
PySpark will only connect to elasticsearch if elasticsearch has started successfully beforehand. Replace the configurations in main.py for the spark session.
-
Run on debug from main.py to see the creation of dummy data, data transformation, and data loading and qeurying to/from elasticssearch
-
the
get_data_from_sql.py
is simply to show the code needed to query to sql database that has the data queried from MRP D365. PySpark would connect normally to query the data.