The primary objective of this assignment is to demonstrate a data treatment and analysis pipeline using the Old Faithful geyser dataset. The goals are to clean and preprocess the dataset and explore the relationship between eruption duration and waiting time using statistical techniques.
Our approach involves utilizing Python libraries to process the dataset and applying statistical techniques (Pearson correlation) to explore the relationship between given two variables.
- Python 3.11 was used as the programming language and these libraries were utilized:
pandas
,matplotlib
,seaborn
,scipy.stats.pearsonr
, andos
. - The
hw4.py
is the main script containing all the code - The resultant figures are stored in
output/
directory. The code will create one if not available already. - Command to run the code:
python hw4.py
. - [Optional] I prefer to store my terminal results in a txt file for future reference. One can do so by running
python hw4.py | tee output/output.txt
. The txt file will be stored in theoutput/
directory as specified in the command.