aws

AWS repository holds all the solutions related to amazon webs services

Steps to execute the project

Cluster Configuration emr - 5.0.0 (Core Hadoop Cluster - Select the first option) Location : N. Virginia

Steps:

Build the java code and generate the executable jar
upload the jar and input file on S3
Provision a cluster on AMAZON EMR
ssh to master instance of EMR using hadoop@"MASTER-URL"
Copy the jar from S3 to local instance aws s3 cp s3://testuseraj/jar/logprocessor-1.0.jar ./
Copy the input file from S3 to local instance aws s3 cp s3://testuseraj/input/bank.txt ./
create a directory in hadoop file system hadoop fs -mkdir /gaps
Copy the input file into HDFS hadoop fs -put ./bank.txt /gaps
Run the code hadoop jar ./logprocessor-1.0.jar com.cs.mapreduce.logprocessor.LogAnalyzer /gaps/bank.txt /gaps/output
Merge the output hdfs dfs -getmerge /gaps/output/ ./out.csv
Upload the out to S3 aws s3 cp ./out.csv s3://testuseraj/output/
Create a manifest file to identify the text files you want to import. (Refer the file from visualization-aws folder)
Upload manifest file to Amazon s3 https://s3.amazonaws.com/testuseraj/manifest.json
On the Amazon QuickSight start page, choose Manage data.
Create new dataset by choosing Amazon s3 icon.
For DataSource name , type a name for the daa source.
Upload a manifest file

18)Choose connect.

=============End of readme file==============

paragbhingre/aws