An analysis of the achievement gap between top-bottom SES students over time.
In this repository I keep the files for the second chapter of my PhD thesis.
Below is an explanation of all the files in the repo.
-
The main files are
.paper/Paper.Rnw
andtransform_data.R
. The first document contains all the text and analysis of the paper and the second one downloads, manipulates and saves the data. -
If you open the
*.Rproj
file you should open the previous two scripts (although I simply load the data produced bytransform_data.R
and don't run it in the.Rnw
file. Checktransform_data.R
for details on how I download the data) -
The data used in the analysis is in the folder
./paper/data
(not in the Github repo, but it contains all data created fromtransform_data.R
) and it is calledpisa_listcol.Rdata
because it is actually atibble
with a list column containing the six PISA waves that will be used. This file is nearly 9GB. -
All PISA waves I use are taken from Przemyslaw Biecek and his
PISA*lite
packages, except for PISA 2012 and PISA 2015, which I download in thetransform_data.R
script (as temporary files) and store everything inpisa_listcol.Rdata
. -
All other
.Rdata
in.paper/data
belong to estimations that take too much time to compute, so I saved provisional versions in this folder. For more details, see themodeling
chunk in thePaper.Rnw
. -
Inside
./paper/
there is apackrat
document containing all the packages used in this analysis together with it's versions. -
Don't worry about overloading Github because I excluded both the
cache
anddata
folders in.gitignore
-
The
playing_around
folder has a script I was playing around. See comments inside script for details. -
Finally, the paper should run seamlessly by compiling
Paper.Rnw
withSweave
and a pdf file should be saved in the./paper
folder. Without messing up thecache
chunks, it currently takes around 4 minutes to compile.