aws-emr-word-frequency
This example helps you getting started with AWS Elastic MapReduce (EMR). It shows how to obtain word frequency and produce a list of words sorted in ascending order from the least to the most frequently used word. The application uses 2 steps, hence it needs two mapper functions and 2 reducer functions.
Running the Application
- Locally in the editor console execute this command:
!python word_frequency_sorted.py word_frequency_book.txt > wfs.txt
- Remotely in the AWS; in the Canopy command terminalexecute this command:
python word_frequency_sorted.py -r emr --conf-path=C:\Users\[user name]\.mrjob.config word_frequency_book.txt > wfs.txt
Notice the directory where to store the .mrjob.config file can be any of your choice.
The following is an example of a minimal configuration file. Minimal .mrjob.config Example
runners:
emr:
ec2_key_pair: [keypairfile] # Name of your key pair file
ec2_key_pair_file: [C:\\dir\\keypairfile.pem] # Path of your key pair file
aws_region: us-west-2
ec2_instance_type: m1.small
num_ec2_instances: 2
ssh_tunnel_to_job_tracker: true