This repository provides information on how to run Hive in an EMR cluster. This is for educational purposes only. This is a living document so the more I learn, the most information there will be.
- AWS Account
- AWS CLI
- EMR
- S3
- EC2
- Datagrip
- Upload files into s3.
- You can use AWS CLI
- You can go to your account at AWS and upload them manually
- Files I used are located here
- Create a key pair
- Your computer will automatically download the pem file. Most likely it will go to the Downloads folder
- Move pem file in $HOME/.ssh
- Change the permission
- chmod 400 {keyfile}.pem
- Go to the EMR Services page
- Create Cluster
- Choose the most recent release (1st choice)
- Choose the core hadoop application (1st choice)
- Choose m5.xlarge as instance type (1st choice)
- Pricing here in case you need it
- Choose 3 instance (default option)
- Use key pair that you just created.
- Use the ssh command given by AWS.
- Send s3 files to EMR cluster
- Run Hive queries