This is to create the data analytics dashboard using elasticsearch, performing EDA thorugh pyspark, hadoop, hive and AWS glue ETL
Let me clearly define the yardsticks, for me not to deviate much;
- Create the data analystics project with either web or the mobile interface
- Big data, have to satisfy at least one V our of 5 Vs (Volume, velocity, veracity, value & variety)
- tech stack should have at least one of these mentioned tools
- Map reduce, spark, storm, hive, pig, flink etc.
Current status : web interface, using fastAPI (yet to be done) for more info on the steps, please follow steps.pdf
- Cloud service provider : AWS, GCP
- AWS for EMR, ETL operations
- GCP hosted elasticsearch cloud, final data visulisations
- Pre-processing : AWS Athena
- Processing & ETL : pyspark, hive, glue
- File system : S3, GCP bucket, HDFS
- Data storage : hive for big data storage and elsticsearch index for final dashbaord
- Visualisation : Kibana Elastic Cloud Dashboard
- BE API : python