This project presents the process of data preparation, research question formulation, data analysis, and data modeling with the goal of extracting insights from the 2018 PISA Dataset. The PISA, which stands for Programme for International Student Assessment is a worldwide set of tests conducted by the Organisation for Economic Co-operation and Development (OECD) to gauge the knowledge and competence of 15-year-old students in the key subject areas of reading, mathematics, and science
This is a major course output in a statistical modeling and simulation class under Mr. Arren C. Antioquia of the Department of Software Technology, De La Salle University.
The task is to create a Jupyter notebook that presents the process leading up to the generation of insights from a raw dataset:
- Dataset Representation
- Data Cleaning
- Exploratory Data Analysis
- Research Questions
- Statistical Inference
- Insights and Conclusions
The complete project specifications can be found in the document Project Specifications.pdf
.
The following real-world data sources (one primary dataset and two auxiliary datasets) were used:
Dataset | Source |
---|---|
2018 OECD PISA School Questionnaire Dataset (Primary Dataset) | Kaggle |
2018 OECD PISA Average Score of Mathematics, Science, and Reading Test Scores Dataset (Auxiliary Dataset) | FactsMaps |
ISO 3166-1 alpha-3 Code List (Auxiliary Dataset) | ISO |
This project is a Jupyter notebook, with the following Python libraries and modules used:
Library/Module | Description | License |
---|---|---|
os |
Provides miscellaneous operating system interfaces | Python Software Foundation License |
pandas |
Provides functions for data analysis and manipulation | BSD 3-Clause "New" or "Revised" License |
numpy |
Provides a multidimensional array object, various derived objects, and an assortment of routines for fast operations on arrays | BSD 3-Clause "New" or "Revised" License |
scipy |
Provides efficient numerical routines, such as those for numerical integration, interpolation, optimization, linear algebra, and statistics | BSD 3-Clause "New" or "Revised" License |
matplotlib |
Provides functions for creating static, animated, and interactive visualizations | Matplotlib License (BSD-Compatible) |
The descriptions are taken from their respective websites.
-
Mark Edward M. Gonzales
mark_gonzales@dlsu.edu.ph
gonzales.markedward@gmail.com -
Hylene Jules G. Lee
hylene_jules_lee@dlsu.edu.ph
lee.hylene@gmail.com