Execute the following steps to run this example :

Activate virtualenv - python version >= 3.8.13 run python3 -m venv virtualenv in the root directory run source virtualenv/bin/activate
Install requirements run pip install -r requirements.txt
If running this code on a mac os environement => Ventura 13.1, you will need to follow this guide to fix the bug: elastic/elasticsearch#91159
You will then need to configure your elasticsearch yml and keystore (sometimes creating the files if applicable)
Replace configs.py with your correct credentials
Run from the elasticsearch installatin directory : run brew services start elasticsearch-full Check if this service has successfully started with run brew services list on mac os
Once the elasticsearch cluster is started, you will need to log in with the username and password you have configured
PySpark will only connect to elasticsearch if elasticsearch has started successfully beforehand. Replace the configurations in main.py for the spark session.
Run on debug from main.py to see the creation of dummy data, data transformation, and data loading and qeurying to/from elasticssearch
the get_data_from_sql.py is simply to show the code needed to query to sql database that has the data queried from MRP D365. PySpark would connect normally to query the data.

ericvincent18/etudecas