ADM_HW2: The Best Books of All Time

This homework consists of a multifaceted assignment that blends data analysis with algorithmic problem-solving on a two large datasets containing information about books and their authors. Variety of tools have been used including Python, Pandas, command-line scripting, Apache Spark, Dask, and AWS EC2 instances.

Memebers of the Group 17:

Project Structure

Here is an overview of the main files in this project repository:

  • algorithmic_question.ipynb: Jupyter notebook containing the solution to the algorithmic question.
  • aws_question.ipynb: Jupyter notebook detailing the AWS question analysis.
  • aws_script.py: Python script for AWS question.
  • commandline_LLM.sh: Executable shell script optimized with an ChatGPT for command-line question.
  • commandline_original.sh: Original executable shell script for command-line question.
  • commandline_question.ipynb: Jupyter notebook that documents the command-line question.
  • .gitignore: File specifying which files and directories Git should ignore.
  • README.md: Markdown file with information about the project.
  • main.ipynb: Jupyter notebook containing in-depth analysis for multiple research questions.