This repository contains the source of the paper "Balancing Specialization and Adaptation in a Transforming Scientific Landscape".
The repository relies on DataLad, a git and git-annexed based software for reproducible research and data sharing. DataLad installation instructions can be found here.
If you do not want to use DataLad and just want to take a look at the code and data from your browser, please go to https://gin.g-node.org/lucasgautheron/adaptation_specialization_material.
To install the repository and all the dependencies (code and data):
datalad install -r git@github.com:lucasgautheron/specialization-adaptation.git
Please note that the repository contains nested submodules. The above command will install all of them recursively.
For certain analyses, you will need to download the source data:
datalad get specialization_adaptation_material/inspire-harvest/database
Some analyses will require intermediate analyses outputs to be downloaded as well. They can be downloaded individually with datalad get, e.g.:
datalad get specialization_adaptation_material/output/etm_20_pretrained/etm_instance.pickle
The manuscript can be re-compiled by issuing the following commands:
make clean
make
The code that performs the main analyses can be found in specialization_adaptation_material/code
.
The table below describes its organization:
Script | Function | Dependencies |
---|---|---|
code/etm.py |
Learns embeddings and topics. | |
code/etm_compile.py |
Counts keywords part of each topic within each article using Naives Bayes and the previously trained topic model. | output/<>/dataset.pickle.py , output/<>/<br> output/<>/etm_instance.pickle.py``` |
code/etm_transfers.py |
Evaluates scientists' research portfolios. | output/<>/topics_counts.py |
code/authors_sociality.py |
Calculates social capital. | output/<>/aggregates.csv |
code/etm_ei.py |
Runs the ecological inference model with Stan. | output/<>/aggregates.csv , output/<>/pooled_resources.parquet |
code/etm_map.py |
Evaluates the MAP performance of the ecological inference model using K fold cross-validation. | output/<>/aggregates.csv , output/<>/pooled_resources.parquet |
code/topic_distance.py |
Evaluates "distances" between topics using different metrics. | All of the above |
code/optimal_transport.py |
Recover migration cost matrix using probabilistic inverse optimal transport. | All of the above and MCMC samples from the ecological inference model. |
code/comparative_analysis.py |
Performs a comparative analysis of the effect of capital on different metrics of change in research interests. | All of the above and MCMC samples from the ecological inference model. |
Each of the above script should be run from specialization_adaptation_material
.
Input parameters can be listed by doing python code/<script.py> --help
.
The code available under specialization_adaptation_material/plots
produces plots using the output of analyses performed by the above scripts.