Statistics

Distributions

https://blog.cloudera.com/blog/2015/12/common-probability-distributions-the-data-scientists-crib-sheet/

Modeling

Probablistic Modeling

https://github.com/jmschrei/pomegranate

Statistics

https://students.brown.edu/seeing-theory/ ** HAS BASEBALL DATA! (‘homerun’ and ‘hitter’)

Bayes

https://github.com/AllenDowney/BayesSeminar

PYMC3

https://www.youtube.com/watch?v=VVbJ4jEoOfU&t=1151s&list=PL0eRwZHmE_S_vLkhXktls0PXSZIHJZ3Fb&index=2 https://www.youtube.com/watch?v=rZvro4-nFIk&index=3&list=PL0eRwZHmE_S_vLkhXktls0PXSZIHJZ3Fb

Linear Regression

Logistic Regression

SVM

Random Forest

Boosting and Bagging

Ensemble

Time Series

https://github.com/ultimatist/ODSC17.git https://www.youtube.com/watch?v=JNfxr4BQrLk&t=3012s http://earthpy.org/pandas-basics.html

Winters

https://grisha.org/blog/2016/01/29/triple-exponential-smoothing-forecasting/

Text / NLP

https://github.com/kwartler/text_mining https://github.com/diegonogare/DataScience/tree/master/Text%20Mining http://juliasilge.com/blog/ https://www.good.is/articles/can-yelp-help-independent-restaurants-drive-chains-out-of-business https://www.springboard.com/blog/eat-rate-love-an-exploration-of-r-yelp-and-the-search-for-good-indian-food/ http://www.theatlantic.com/business/archive/2011/10/how-yelp-helps-steer-people-away-fast-food-chains/337181/ http://cs109.joeong.com/ <— cool MIT project https://www.canva.com/design/DACJbaSfIMY/jf93l6bhZr1WO1CgVXX0DA/edit https://blog.insightdatascience.com/super-donor-detecting-hidden-matches-in-a-public-sperm-donor-registry-a687fe6e05a0#.rvxktifgh http://www.dailydot.com/layer8/fake-news-sites-list-facebook/ https://journals.agh.edu.pl/csci/article/viewFile/1339/1311 https://priceonomics.com/our-fixation-on-terrorism/ http://dlab.berkeley.edu/blog/scraping-new-york-times-articles-python-tutorial https://aqibsaeed.github.io/2016-07-26-text-classification/ http://people.cs.vt.edu/naren/papers/sdm2016.pdf http://www.kdnuggets.com/2015/01/text-analysis-101-document-classification.html http://blog.christianperone.com/2011/09/machine-learning-text-feature-extraction-tf-idf-part-i/ https://pdfs.semanticscholar.org/aa96/9114cf6e4d77c5bb3dd62a20bee3446f33ab.pdf http://nlp.stanford.edu/courses/cs224n/2011/reports/nccohen-aatreya-jameszjj.pdf http://nlp.stanford.edu/courses/cs224n/2012/reports/kat_busch_writeup.pdf https://www.cs.sfu.ca/~anoop/papers/pdf/anoop_maryam-canvas-2013.pdf http://hint.fm/papers/wordtree_final2.pdf https://bl.ocks.org/mbostock/4339083 http://bbengfort.github.io/tutorials/2016/05/19/text-classification-nltk-sckit-learn.html <— Teresa’s most used NLTK tutorial for capstone https://rud.is/b/2013/03/12/visualizing-risky-words-part-4-d3-word-trees/ http://peekaboo-vision.blogspot.de/2012/11/a-wordcloud-in-python.html http://blancosilva.github.io/post/2016/08/24/bokeh.html http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/ http://streamhacker.com/2010/05/10/text-classification-sentiment-analysis-naive-bayes-classifier/ http://fjavieralba.com/basic-sentiment-analysis-with-python.html http://www.nytimes.com/interactive/2012/09/06/us/politics/convention-word-counts.html?_r=0 http://people.csail.mit.edu/azar/wp-content/uploads/2011/09/thesis.pdf https://www.twinword.com/blog/interpreting-the-score-and-ratio-of-sentiment/ http://stackoverflow.com/questions/26569478/performing-grid-search-on-sklearn-naive-bayes-multinomialnb-on-multi-core-machin

Deep Learning

https://github.com/danromuald?tab=repositories

Visualizations

http://groupvisual.io/work/

Chord

http://www.delimited.io/blog/2013/12/8/chord-diagrams-in-d3

Matplotlib

https://github.com/WeatherGod/interactive_mpl_tutorial

Bokeh

D3

https://github.com/morganecf/imdb-odsc

Data Science Project Management

http://www.datasciencemanifesto.org/ https://drivendata.github.io/cookiecutter-data-science/#cookiecutter-data-science https://www.slideshare.net/srikanthps/scrum-in-15-minutes-presentation https://www.slideshare.net/joelhorwitz/agile-data-science-36258963 https://www.slideshare.net/srogers74/agile-software-development-overview-presentation/11-Introduction_to_Agile_Methodologies_contd https://www.slideshare.net/katemats/manage-datascience-2013strata/17-After_For_the_top_search gm-spacagna/datasciencemanifesto-copy#1

Blogs

http://vrl.cs.brown.edu/color - generates categorical color palettes http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3 - generates categorical color palettes http://algorithms-tour.stitchfix.com/#data-platform - storytelling with d3 https://students.brown.edu/seeing-theory/regression/index.html#first - visually seeing statistics theories

Open Source

http://www.kdnuggets.com/2016/11/top-20-python-machine-learning-open-source-updated.html

General Data Science

http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free https://www.import.io/post/20-questions-to-detect-fake-data-scientists/ http://www.datatau.com/ https://www.yhat.com/ http://slendermeans.org/ml4h-ch6.html https://humancomputation.com/blog/ https://perplex.city/parallel-thinking-b4076461ff60#.ut0jfngkv http://machinelearningmastery.com/metrics-evaluate-machine-learning-algorithms-python/ http://canworksmart.com/using-mean-absolute-error-forecast-accuracy/ https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-1-for-beginners-bag-of-words https://github.com/Mithers/Portfolio <— found some other GA grad’s github http://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/ http://www.visiondummy.com/2014/03/eigenvalues-eigenvectors/ http://www.visiondummy.com/2014/03/divide-variance-n-1/#Parameter_variance http://scott.fortmann-roe.com/docs/BiasVariance.html http://machinelearningmastery.com/time-series-forecasting-supervised-learning/?__s=vu4zbwvwhtewqsso99ny https://www.analyticsvidhya.com/blog/2016/03/practical-guide-principal-component-analysis-python/ https://onlinecourses.science.psu.edu/stat505/node/54 http://www.ats.ucla.edu/stat/sas/output/principal_components.htm https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials https://climateecology.wordpress.com/2014/01/27/pystan-a-basic-tutorial-of-bayesian-data-analysis-in-python/ http://nedbatchelder.com/text/unipain.html http://docs.statwing.com/interpreting-residual-plots-to-improve-your-regression/#nonlinear-header https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ https://blog.remibergsma.com/2012/07/10/setting-locales-correctly-on-mac-osx-terminal-application/ http://www.techpoweredmath.com/spark-dataframes-mllib-tutorial/ https://maryrosecook.com/blog/post/git-from-the-inside-out <— Interesting stuff on functional programming

Matrix Factorization w/ deep learning

http://www1.cmc.edu/pages/faculty/BHunter/

Industries

Finance

https://github.com/CaptainKanuk https://songyao21.github.io/Research_Papers/Risk%20Transfer%20versus%20Cost%20Reduction.pdf

Kaggle

https://github.com/jdwittenauer/kaggle https://www.slideshare.net/markpeng/general-tips-for-participating-kaggle-competitions

Tools

Debugging

http://kawahara.ca/how-to-debug-a-jupyter-ipython-notebook/

Jupyter Notebooks

https://github.com/drivendata/data-science-is-software

SciKit Learn

https://github.com/amueller/advanced_training

Apache Drill

https://github.com/cgivre