Evaluate your use of the data science tool kit. Quiz
Explanation of the quiz.
- Concern regarding a crisis in research reproducibility
- Big data and more complex analytic models
- Renewed interest in open science
- Expanded collaborations
Goal: improve research transparency, reproducibility, quality, efficiency and implementation
“Academic institutions can and must do better. We should be taking multiple approaches to make science more reliable.”
Jeffrey Flier. Dean of Medicine, Harvard University. Nature 549, 133 (2017)
“Put simply, this means that researchers should make their computational workflow and data available for others to view. They should include the code used to generate published figures and omit only data that cannot be released for privacy or legal reasons.”
Jeffrey M. Perkel. A toolkit for data transparency takes shape. Nature 560, 513-515 (2018)
"More than 70% of researchers have tried and failed to reproduce another scientist's experiments, and more than half have failed to reproduce their own experiments."
Monya Baker. 1,500 scientists lift the lid on reproducibility. Nature 533, 452-4 (2016)
-
Donoho D, 50 years of Data Science. Sept. 18, 2015
-
Stukel TA, Austin PC, Azimaee M, Bronskill SE, Guttmann A, Paterson JM, Schull MJ, Sutradhar R, Victor JC. Envisioning a Data Science Strategy for ICES. Toronto, ON: Institute for Clinical Evaluative Sciences; 2017. ISBN: 978-1-926850-77-1
-
Rumsfeld JS, Joynt KE, Maddox TM. Big data analytics to improve cardiovascular care: promise and challenges. Nature reviews Cardiology. 2016;13(6):350-9.
-
Wilson G, Aruliah DA, Brown CT, Chue Hong NP, Davis M, Guy RT, et al. Best practices for scientific computing. PLoS Biol. 2014;12(1):e1001745.
-
Hicks SC, Irizarry RA. A Guide to Teaching Data Science. The American Statistician. 2017;72(4):382-91. 10.1080/00031305.2017.1356747
-
Flier, J. (2017). Faculty promotion must assess reproducibility. Nature, 549(7671), 133. doi:10.1038/549133
-
Perkel, J. M. (2018). A toolkit for data transparency takes shape. Nature, 560, 513-515.
-
Baker, M. 1,500 scientists lift the lid on reproducibility. [Nature 533, 452-4 (2016)](https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.199700.
-
Woelfle, M.; Olliaro, P.; Todd, M. H. (2011). Open science is a research accelerator. Nature Chemistry. 3: 745–748. doi:10.1038/nchem.1149
-
Stodden, V., McNutt, M., Bailey, D. H., Deelman, E., Gil, Y., Hanson, B., . . . Taufer, M. (2016). Enhancing reproducibility for computational methods. Science, 354(6317), 1240-1241. doi:10.1126/science.aah6168
-
Kopt D. This year’s Nobel Prize in economics was awarded to a Python convert. qz.com Oct 2018.
-
Somers J. The Scientific Paper Is Obsolete: Here's what's next. The Atlantic Apr 2018.
-
Kitzes J, Turek D, Deniz F. The practice of reproducible research: case studies and lessons from the data-intensive sciences. Univ of California Press; 2017.
-
Pioneering ‘live-code’ article allows scientists to play with each other’s results. Nature
-
Perez-Riverol Y, Gatto L, Wang R, Sachsenberg T, Uszkoreit J, Leprevost Fda V, et al. Ten Simple Rules for Taking Advantage of Git and GitHub. PLoS Comput Biol. 2016;12(7):e1004947.
-
Gitkraken (the Git client our team uses)
-
What nobody tells you about documentation. Divio Blog. Accessed Nov 2018
-
Why Jupyter is data scientist’ computational notebook of choice
-
Advantages to using R Markdown for data analysis over Jupyter Notebooks
-
R for Data Science. G Grolemund and H Wickham
-
Efficient R programming. C Gillespie, R Lovelace
-
R for Data Science- Chapter 19: Functions. G Grolemund, H Wickham
- IBM developerWorks. What is PMML? Accessed 2018.