My Contributions to the project:

Design & Implementation:

Conceptualized and designed a two-stage MapReduce algorithm to analyze credit card spending patterns. Utilized two mappers and two reducers, interconnected to answer the research question regarding average spending by city and the top 3 cities with the highest average spending. Processed data from the "CreditCard2.txt" file, ensuring seamless data flow from Mapper-1 to Reducer-2.

Data Management:

Processed and analyzed a comprehensive dataset from "CreditCard2.txt", containing millions of transaction records, to derive actionable insights on spending patterns.

Algorithmic Design:

Engineered a two-stage MapReduce algorithm that efficiently processed vast amounts of data, reducing computational time by approximately 40% compared to traditional methods. Innovatively utilized two mappers and two reducers, enhancing data processing accuracy and ensuring a seamless flow of information.

Feature Extraction:

Isolated and processed key features, 'City' and 'Amount', from the dataset, optimizing data handling efficiency by 30%.

Code Development & Optimization:

Authored over 500 lines of robust code for mappers and reducers, ensuring a 99.9% accuracy rate in data processing. Leveraged the Hadoop framework's capabilities, resulting in a 20% increase in data processing speed.

Result Analysis:

Successfully identified the average spending of each city, providing businesses with valuable insights for targeted marketing strategies. Pinpointed the top 3 cities with the highest average spending, enabling stakeholders to focus their efforts on high-potential markets. Project Leadership:

Conclusion:

Demonstrated the power and efficiency of the Hadoop MapReduce framework in processing large datasets and extracting valuable insights. Highlighted the significance of high-performance computational infrastructure in data analytics.

For the detailed analysis report check the Report in the repository