I wanted to accomplish one thing: Take files that contain JSON objects, convert them into Thrift objects and store them in a Parquet file using a Hadoop job.
Read more in this post
Steps I followed:
- Create a maven project
- Get the JSON files ready
- Create a Thrift schema
- Generate Thrift classes from the schema
- Create a converter that receives a JSON string and returns a Thrift object
- Create a Mapper that receives a JSON string and emits a Thrift object
- Create a job that reads the JSON files and saves the Parquet file