/CRI

Project Repository. Replication code.

Primary LanguageHTMLCreative Commons Zero v1.0 UniversalCC0-1.0

The Hidden Universe of Data-Analysis

Important Documents

Current Working Paper Version

Current Supplementary Materials

Executive Report - describing the full study

Research Design and Analysis:

Nate Breznau
Eike Mark Rinke
Alexander Wuttke
Hung H.V. Nguyen

Participant Researchers:

Show all participant co-researchers Muna Adem, Jule Adriaans, Amalia Alvarez-Benjumea, Henrik Andersen, Daniel Auer, Flavio Azevedo, Oke Bahnsen, Dave Balzer, Paul C. Bauer, Gerrit Bauer, Markus Baumann, Sharon Baute, Verena Benoit, Julian Bernauer, Carl Berning, Anna Berthold, Felix S.Bethke, Thomas Biegert, Katharina Blinzler, Johannes N. Blumenberg, Licia Bobzien, Andrea Bohman, Thijs Bol, Amie Bostic, Zuzanna Brzozowska, Katharina Burgdorf, Kaspar Burger, Kathrin Busch, Juan Carlos-Castillo, Nathan Chan, Pablo Christmann, Roxanne Connelly, Christian Czymara, Elena Damian, Alejandro Ecker, Achim Edelmann, Maureen A.Eger, Simon Ellerbrock, Anna Forke, Andrea Forster, Chris Gaasendam, Konstantin Gavras, Vernon Gayle, Theresa Gessler, Timo Gnambs, Amélie Godefroidt, Alexander Greinert, Max Grömping, Martin Groß, Stefan Gruber, Tobias Gummer, Andreas Hadjar, Jan Paul Heisig, Sebastian Hellmeier, Stefanie Heyne, Magdalena Hirsch, Mikael Hjerm, Oshrat Hochman, Jan H. Höffler, Andreas Hövermann, Sophia Hunger, Christian Hunkler, NoraHuth, Zsofia Ignacz, LauraJacobs, Jannes Jacobsen, Bastian Jaeger, Sebastian Jungkunz, Nils Jungmann, Mathias Kauff, Manuel Kleinert, Julia Klinger, Jan-Philipp Kolb, Marta Kołczyńska, John Kuk, Katharina Kunißen, Dafina Kurti, Philipp Lersch, Lea-Maria Löbel, Philipp Lutscher, Matthias Mader, Joan Madia, Natalia Malancu, Luis Maldonado, Helge Marahrens, Nicole Martin, Paul Martinez, Jochen Mayerl, Oscar J. Mayorga, Patricia McManus, Kyle McWagner, Cecil Meeusen, Daniel Meierrieks, Jonathan Mellon, Friedolin Merhout, Samuel Merk, Daniel Meyer, Jonathan Mijs, Cristobal Moya, Marcel Neunhoeffer, Daniel Nüst, Olav Nygård, Fabian Ochsenfeld, Gunnar Otte, Anna Pechenkina, Christopher Prosser, Louis Raes, Kevin Ralston, Miguel Ramos, Frank Reichert, Leticia Rettore Micheli, Arne Roets, Jonathan Rogers, Guido Ropers, Robin Samuel, Gregor Sand, Constanza Sanhueza Petrarca, Ariela Schachter, Merlin Schaeffer, David Schieferdecker, Elmar Schlueter, Katja Schmidt, Regine Schmidt, Alexander Schmidt-Catran, Claudia Schmiedeberg, Jürgen Schneider, Martijn Schoonvelde, Julia Schulte-Cloos, Sandy Schumann, Reinhard Schunck, Jürgen Schupp, Julian Seuring, Henning Silber, Willem Sleegers, Nico Sonntag, Alexander Staudt, Nadia Steiber, Nils Steiner, Sebastian Sternberg, Dieter Stiers, Dragana Stojmenovska, Nora Storz, Erich Striessnig, Anne-Kathrin Stroppe, Janna Teltemann, Andrey Tibajev, Brian Tung, Giacomo Vagni, Jasper Van Assche, Metavan der Linden, Jolanda van der Noll, Arno Van Hootegem, Stefan Vogtenhuber, Bogdan Voicu, Fieke Wagemans, Nadja Wehl, Hannah Werner, Brenton Wiernik, Fabian Winter, Christof Wolf, Nan Zhang, Conrad Ziller, Björn Zakula, Stefan Zins and Tomasz Żółtak

Abstract

This is the repository for preparation and analysis of data obtained from the Crowdsourced Replication Initiative (Breznau, Rinke and Wuttke et al 2018) and used as the basis for the paper Observing Many Researchers Using the Same Data and Hypothesis Reveals a Hidden Universe of Uncertainty.

Recently, many researchers independently testing the same hypothesis using the same data, reported tremendous variation in results across scientific disciplines. This variability must derive from differences in each research process. Therefore, observation of these differences should reduce the implied uncertainty. Through a controlled study involving 73 researchers/teams we tested this assumption. Taking all research steps as predictors explains at most 2.6% of total effect size variance, and 10% of the deviance in subjective conclusions. Expertise, prior beliefs and attitudes of researchers explain even less. Ultimately, each model was unique, and as a whole this study provides evidence of a vast universe of research design variability normally hidden from view in the presentation, consumption, and perhaps even creation of scientific results.

Workflow

The workflow is provided in a literate programming format, R Markdown notebooks (.Rmd), and split across a number of files as described below. Next to the .Rmd files, there are also .html files of the same name. The latter contain HTML renderings of the notebooks with the created figures and tables, so that non-R users may view the workflow results more easily with any regular browser software. For example, the file 01_CRI_Descriptives.Rmd has a corresponding 01_CRI_Descriptives.html file in the same folder for easy viewing without the need for running any R code. Paths in the notebooks are handled with the here package and the paths are all relative to the projects root directory (where this README.md file is located). You can open an interactive environment to explore and execute the analysis yourself based on Binder (Project Jupyter, 2018):

Binder

The runtime environment created for the Binder uses an MRAN snapshot of 2020-03-29 (see file .binder/runtime.txt) and installs all required R packages in the file .binder/install.R.

The workflow includes a shinyapp that allows users to interact with results using specification curves.

1. Source Code Cleaning

We collected the code from 73 teams and cleaned it for public sharing. This involved qualitative identification of model specifications, ensuring replicability, extracting Average Marginal Effects (AMEs) and redacting any identifying features. The resulting codes are compiled by software type in the sub-folders of this project, ordered by team ID number (in folder team_code, and sub-folders: team_code_SPSS, team_code_Stata, team_code_Mplus and team_code_R). The code in the team_code_R) folder imports the results from all other codes to compile a final joined dataset of effect sizes and confidence interval measures.

Users should be aware that the main data files include team zero, which is the results and model specifications from the study of Brady and Finnigan (2014) providing a launching point for the CRI; team zero is dropped from our main analyses but provides a point of comparison.

2. Data Pre-Preparation

Prior to our main analyses we import data from the Participant Survey including subjective voting on model quality, and the voting during the post-result deliberation. The code for these files (001-003) are contained in the folder data_prep. It is not necessary to run these scripts as their output is already saved in the data folder.

3. Code

Our primary analyses and results are in the code folder. Many of the results in this folder depend on data preparation done in the data_prep folder.

List of Command Code Files and their Functions

All of the following are located in the main or sub-folders of the folder code.

Filename Location Description Output
001_CRI_Prep_Subj_Votes.Rmd data_prep Compile peer ranking of models FigS4
002_CRI_Data_Prep.Rmd data_prep Primary data cleaning and merging; measurement of researcher characteristics TblS1;TblS3;FigS3;FigS3_fit_stats
003_CRI_Multiverse_Simulation.Rmd data_prep Sets up multiverse data
01_CRI_Descriptives.Rmd code Descriptive statistics; codebook of 107 model design steps FigS5;FigS10
02_CRI_Common_Specifications.Rmd code identifying (dis)similarities across models TblS4
03_CRI_Spec_Analysis.Rmd code Plotting specification curves Fig1;FigS6;FigS7;FigS8;FigS9
04_CRI_Main_Analyses.Rmd code Main regression models explaining outcome variance within and between teams Fig3;TblS5;TblS6(see bottom of S5);TblS7
05_CRI_Main_Analyses_Variance_Function.Rmd code Variance function regressions to explain variation in variance by team Fig2;FigS11;FigS12;FigS13;TblS11
06_CRI_Multiverse.Rmd code Function to test all possible combinations of submitted model specifications to explain variance TblS8;TblS10
07_CRI_DVspecific_Analyses.Rmd code re-running main models separately by dependent variable (6 ISSP survey questions) TblS9

4. Users may Run All Code

The following scripts run all notebook files in order to check there are no code issues.

source("all.R")

Source Data

The data preparation code is in the sub-folder data_prep. After the data preparation files, all data files ready for the data analysis are in the data folder. There are numerous data files because the different participants' codes often require individual special files to run properly. The data files needed to reproduce all of the data analysis are:

Filename Description Source
MAIN FILES Used in Main Analyses 01-07
cri.csv Main data analysis file, model & team-levels. All specifications coded by the PIs, team test results and researcher characteristics in numeric format Worked up in code/data_prep
cri_str.csv A string-format only version of cri.csv Worked up in code/data_prep
cri_team.csv A version of cri_str.csv aggregated team-level means (N = 89 because 16 teams conducted independent hypothesis tests by 'stock' and 'flow' immigration measures) Worked up in code/data_prep
popdf_out.Rdata The peer review/deliberation scoring of model specifications as ranked by all participants; excepting non-response Generated in sub-folder CRI/data_prep
SUB-FILES Used in Preparation of Data or App
Research Design Votes.xlsx Based on participant pre-registered designs, plus cursory review of all research designs. Not a fully accurate portrayal of final research designs because, (a) the broad range of specifications not reported in basic research designs and (b) the participant's often deviated from their proposed designs, if only slightly This is a copy of the actual template (a Google Sheet) used to create the peer review voting system in the Participant Survey
cri_shiny.csv The model-level data needed to run the shiny app Generated in code/data_prep
cri_shiny_team.csv The team-level data needed to run the shiny app Generated in code/data_prep

Start local Binder

Install repo2docker and then run

repo2docker --editable .