/pisa-2018-analysis

Jupyter notebook presenting the process of data preparation, research question formulation, data analysis, and data modelling with the goal of extracting insights from the 2018 PISA Dataset

Primary LanguageJupyter Notebook

2018 PISA Analysis

badge badge-python badge badge badge

This project presents the process of data preparation, research question formulation, data analysis, and data modeling with the goal of extracting insights from the 2018 PISA Dataset. The PISA, which stands for Programme for International Student Assessment is a worldwide set of tests conducted by the Organisation for Economic Co-operation and Development (OECD) to gauge the knowledge and competence of 15-year-old students in the key subject areas of reading, mathematics, and science

This is a major course output in a statistical modeling and simulation class under Mr. Arren C. Antioquia of the Department of Software Technology, De La Salle University.

Task

The task is to create a Jupyter notebook that presents the process leading up to the generation of insights from a raw dataset:

  • Dataset Representation
  • Data Cleaning
  • Exploratory Data Analysis
  • Research Questions
  • Statistical Inference
  • Insights and Conclusions

The complete project specifications can be found in the document Project Specifications.pdf.

Datasets

The following real-world data sources (one primary dataset and two auxiliary datasets) were used:

Dataset Source
2018 OECD PISA School Questionnaire Dataset (Primary Dataset) Kaggle
2018 OECD PISA Average Score of Mathematics, Science, and Reading Test Scores Dataset (Auxiliary Dataset) FactsMaps
ISO 3166-1 alpha-3 Code List (Auxiliary Dataset) ISO

Built Using

This project is a Jupyter notebook, with the following Python libraries and modules used:

Library/Module Description License
os Provides miscellaneous operating system interfaces Python Software Foundation License
pandas Provides functions for data analysis and manipulation BSD 3-Clause "New" or "Revised" License
numpy Provides a multidimensional array object, various derived objects, and an assortment of routines for fast operations on arrays BSD 3-Clause "New" or "Revised" License
scipy Provides efficient numerical routines, such as those for numerical integration, interpolation, optimization, linear algebra, and statistics BSD 3-Clause "New" or "Revised" License
matplotlib Provides functions for creating static, animated, and interactive visualizations Matplotlib License (BSD-Compatible)

The descriptions are taken from their respective websites.

Authors