Apps:
- text sum: http://esapi.intellexer.com/Summarizer
- http://www.deeplearningpatterns.com/doku.php/applications
- mt: http://104.131.78.120/
- rnn: http://www.cs.toronto.edu/~ilya/fourth.cgi?prefix=I+have+a+dream.+&numChars=150
13.7
- image: dl framework: https://pbs.twimg.com/media/ClAUr5EUkAA9--0.jpg:large
- image: knowledge: https://pbs.twimg.com/media/CifCYeSUUAArriB.jpg:large
- numpy, scipy and pandas way to go: https://plot.ly/~empet/13902/numpy-cluster-in-the-network-of-python-packages/
- rnn live stream, character image: https://www.youtube.com/watch?v=wSpPJtenw_c
- big data 5cent https://pbs.twimg.com/media/CnMAV2XXgAAkBJY.jpg
- google research: https://plus.google.com/+ResearchatGoogle/posts
- deep learning word cloud: https://pbs.twimg.com/media/CnLJLdVXYAEpN6w.jpg:large
- jupyter on rpi: http://makeyourownneuralnetwork.blogspot.de/2016/03/ipython-neural-networks-on-raspberry-pi.html
12.7
- http://mghassem.mit.edu/insights-word2vec/
- home depot: search relevance: https://github.com/ChenglongChen/Kaggle_HomeDepot/blob/master/Doc/Kaggle_HomeDepot_Turing_Test.pdf
- sentiment http://districtdatalabs.silvrback.com/modern-methods-for-sentiment-analysis
- reading tea leaves: https://www.umiacs.umd.edu/~jbg/docs/nips2009-rtl.pdf
- https://www.linkedin.com/pulse/putting-semantic-representational-models-test-tf-idf-k-means-parsa?forceNoSplash=true
11.7
- gensim phrase: http://www.markhneedham.com/blog/2015/02/12/pythongensim-creating-bigrams-over-how-i-met-your-mother-transcripts/
- xgboost out of the box: https://www.quora.com/What-machine-learning-approaches-have-won-most-Kaggle-competitions/answer/Ben-Hamner?srid=cgo&share=45a4f6de
- ES part 2: http://insightdataengineering.com/blog/elasticsearch-core/
- GAN: https://www.youtube.com/watch?v=deyOX6Mt_As
8.7
- lsa + classification: https://github.com/chrisjmccormick/LSA_Classification
- perturbation + adversarial lstm https://arxiv.org/pdf/1605.07725v1.pdf
- I don't drink tiger (beer) confused neural transalation lisa: http://104.131.78.120/
- always love Mikolov related: fastText https://arxiv.org/abs/1607.01759
- Explaining the classisfier: https://www.youtube.com/watch?v=hUnRCxnydCc
- LIME: https://github.com/marcotcr/lime
- https://artistdetective.wordpress.com/2016/06/15/how-to-teach-a-computer-common-sense/
- 6 cons, top uni + org: http://www.marekrei.com/blog/analysing-nlp-publication-patterns/
- https://github.com/ijmbarr/panama-paper-network/blob/master/panama_network.ipynb
7.7
- https://engineers.sg/conference/pyconsg2016
- https://speakerdeck.com/tmylk/americas-next-topic-model?slide=6
- http://aclweb.org/anthology/J93-1003
6.7
- machine learning done wrong: http://dataskeptic.com/epnotes/machine-learning-done-wrong.php
- https://archive.org/details/twitterstream
- https://github.com/lintool/twitter-tools
- CLT: http://www.jeannicholashould.com/the-theorem-every-data-scientist-should-know.html
- https://blog.init.ai/three-impactful-machine-learning-topics-at-icml-2016-465be5ae63a#.yxw5wiisw
- http://www.machinedlearnings.com/2016/07/icml-2016-thoughts.html?spref=tw&m=1
5.7
4.7
- bot: http://52bots.tumblr.com/post/108322694954/11-ebook-of-black-earth-what-an-ebooks-style
- http://www.degeneratestate.org/posts/2016/Apr/20/heavy-metal-and-natural-language-processing-part-1/
- http://aclweb.org/anthology/J93-1003
- https://www.thefinancialist.com/man-vs-machine-what-happens-when-machines-can-learn-2/
- https://twimlai.com/fatal-ai-autopilot-crash-eu-may-prohibit-machine-learning-twiml-20160701/
- https://www.coursera.org/learn/natural-language-processing
1.7
- word2vec pipeline: https://github.com/NIHOPA/pipeline_word2vec
- visual recognition: http://cs231n.github.io/
- chris olah cv: https://colah.github.io/cv.pdf
- Google ML tut: https://www.youtube.com/watch?v=cSKfRcEDGUs
- ES: anatomy: http://insightdataengineering.com/blog/elasticsearch-crud/
- https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html
- http://arxiv.org/pdf/1512.00567v3.pdf
30.6
- voice detection: wav --> features --> SVM + RF + XGB --> RF --> prediction: http://www.primaryobjects.com/2016/06/22/identifying-the-gender-of-a-voice-using-machine-learning/
- wide and deep: https://research.googleblog.com/2016/06/wide-deep-learning-better-together-with.html
- https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/learn/python/learn
29.6
- hyperword: https://bitbucket.org/omerlevy/hyperwords
- wmd 20newsgroup: http://vene.ro/blog/word-movers-distance-in-python.html
- init orthogonal mat for RNN: http://smerity.com/articles/2016/orthogonal_init.html
- markov chain: https://github.com/fatiherikli/markov-chain-demo
- https://medium.com/@bjacobso/is-it-brunch-time-ffe3adf485d8#.sx662tn0x
- singapore statup eco: https://docs.google.com/document/d/1AsSwH_kJ5qm7X8Pb5H2P_iaCpUA9tr-93GGNuN-orXk/mobilebasic?pli=1
- http://www.datameer.com/company/datameer-blog/big-data-ecosystem/
- https://www.facebook.com/notes/bui-hai-an/v%C3%A0i-l%E1%BB%9Di-khuy%C3%AAn-thu-nh%E1%BA%B7t-%C4%91%C6%B0%E1%BB%A3c-t%E1%BB%AB-ges-p2/10153489287901106
- https://www.facebook.com/notes/bui-hai-an/v%C3%A0i-l%E1%BB%9Di-khuy%C3%AAn-thu-nh%E1%BA%B7t-%C4%91%C6%B0%E1%BB%A3c-t%E1%BB%AB-ges-p1/10153478669101106?notif_t=like¬if_id=1467086253182050
28.6
- http://lstm.seas.harvard.edu/
- https://www.reddit.com/r/MachineLearning/comments/4q5fsu/advanced_word_embeddings_for_seq2seq_applications/
- https://www.dataquest.io/blog/data-science-newsletters/
- http://nbviewer.jupyter.org/github/taddylab/deepir/blob/master/w2v-inversion.ipynb
- http://www.pyimagesearch.com/2016/06/27/my-top-9-favorite-python-deep-learning-libraries/
27.6
- document classification: https://github.com/RaRe-Technologies/movie-plots-by-genre
- classical nlp: https://github.com/tmylk/pycon-2016-nlp-tutorial/blob/master/jupyter/classical-nlp/classical-nlp.ipynb
- document classification: https://speakerdeck.com/tmylk/document-classification-with-word2vec-at-pydata-nyc
- inverse word2vec with hs: http://nbviewer.jupyter.org/github/taddylab/deepir/blob/master/w2v-inversion.ipynb
- wmd: http://tech.opentable.com/2015/08/11/navigating-themes-in-restaurant-reviews-with-word-movers-distance/
- defense of w2v: http://www.cs.tau.ac.il/~wolf/papers/qagg.pdf
- plagiarism: http://douglasduhaime.com/blog/cross-lingual-plagiarism-detection-with-scikit-learn
- genre stereotype in word embedding: https://arxiv.org/pdf/1606.06121v1.pdf
- twitter intent: https://twitter.com/intent/user?user_id=328567812
- google n-gram: https://books.google.com/ngrams/graph?content=she+is+a+nurse%2C+he+is+a+nurse&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Cshe%20is%20a%20nurse%3B%2Cc0%3B.t1%3B%2Che%20is%20a%20nurse%3B%2Cc0
- FE techniques in simple words: https://codesachin.wordpress.com/2016/06/25/non-mathematical-feature-engineering-techniques-for-data-science/
- Money laundering detection: http://conf.startup.ml/blog/aml
- bias embedding: http://nlpers.blogspot.hr/2016/06/language-bias-and-black-sheep.html
24.6
- keynote: http://tpq.io/p/pyconsg.html#/
- customer segmentation: https://github.com/maoting1223/pycon_sg_2016
- https://github.com/mirri66/geodata
23.6
- I'm a speaker at pyconsg 2016: https://pycon.sg/schedule/
- https://github.com/airbnb/caravel
- googlenet: 22 layers inception http://arxiv.org/abs/1409.4842
22.6
- http://varianceexplained.org/r/year_data_scientist/
- conflicted ds: https://www.youtube.com/watch?v=7h2S3eM1OYQ&feature=youtu.be
- ICML 2016: http://icml.cc/2016/?page_id=1839
- My russian friends: https://alexanderdyakonov.wordpress.com/2016/05/31/avito-telstra-bnp/
- https://www.youtube.com/watch?v=1HrkBzLBJQg
21.6
- book of Andrew Ng: http://www.mlyearning.org/
- why we need so many classifiers: http://jmlr.org/papers/volume15/delgado14a/delgado14a.pdf
- variety of models to choose: https://www.quora.com/What-are-the-advantages-of-different-classification-algorithms
20.6
- deephack, https://drive.google.com/file/d/0B0PX5JnpNX8yQ184Z3kwWmYyQUU/view?pref=2&pli=1
- mikilov rnn + scrnn: https://drive.google.com/file/d/0B0PX5JnpNX8yQ184Z3kwWmYyQUU/view?pref=2&pli=1
18.6
- deep hack: http://deepqa.tilda.ws/page78823.html
- Quoc Le, DL for lang understanding: https://www.youtube.com/watch?v=KmOdBS4BXZ0
- https://drive.google.com/file/d/0B0PX5JnpNX8yQ184Z3kwWmYyQUU/view?pref=2&pli=1
- https://drive.google.com/file/d/0BwJbEyAV32gETHR4YmdjcW5JUlU/view?pref=2&pli=1
17.6
16.6
toread:
-
relationship modeling network https://github.com/miyyer/rmn
-
QA compose NN: http://arxiv.org/pdf/1601.01705v4.pdf
-
gensim 0.13 changelog: https://github.com/RaRe-Technologies/gensim/blob/develop/CHANGELOG.txt
-
https://building-babylon.net/2015/06/03/document-embedding-with-paragraph-vectors/
15.6
- map 140M tweet: http://www.mapd.com/demos/tweetmap/
- http://blog.yhat.com/posts/rodeo-2.0-release.html
13.6
- personality: https://personality-insights-livedemo.mybluemix.net/
- imbalanced data https://github.com/ngaude/kaggle/blob/master/cdiscount/ImbalancedLearning.pdf
- just remember: https://ipgp.github.io/scientific_python_cheat_sheet/
- work on postgres https://github.com/dbcli/pgcli
- productionize with Kafka: http://blog.parsely.com/post/3886/pykafka-now/
11.6
9.6
user classifiers:
- http://www.slideshare.net/TedXiao/winning-kaggle-101-dmitry-larkos-experiences
- Humanizr: http://networkdynamics.org/resources/software/humanizr/
- tweet coder: http://networkdynamics.org/resources/software/tweetcoder/
- latent user, delip rao: http://www.cs.jhu.edu/~delip/smuc.pdf
Readings:
- intro prob in ipython: http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb
- thesis learning algos from data: http://www.cs.nyu.edu/media/publications/zaremba_wojciech.pdf
- Practical tools for exploring data and models: http://had.co.nz/thesis/practical-tools-hadley-wickham.pdf
8.6
- deep learning demo: http://www.somatic.io/models/V7Zx4Z9A
- ml debug: https://www.quora.com/Whats-the-best-way-to-debug-natural-language-processing-code-How-do-we-know-its-running-as-we-assume-I-ask-this-question-because-I-read-one-post-titled-as-what-is-the-best-way-to-test-machine-learning-code-I-am-working-on-one-natural-language-processing-task-and-has-confusion-on-how-to-debug-NLP
- confusion matrix: http://www.innoarchitech.com/machine-learning-an-in-depth-non-technical-guide-part-4/
- nlp ml error analysis tool: http://www.aclweb.org/anthology/C14-2001
- deepnet online: https://github.com/anujgupta82/DeepNets/blob/master/Online_Learning/Incorporating_feedback_in_DeepNets.ipynb
- sgd + elasticnet penalty better? https://www.quora.com/Are-there-any-real-applications-of-using-Elastic-Net
- http://cs231n.github.io/linear-classify/
- binary classification dog vs cat: http://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
- deeptext: am I scared? https://www.linkedin.com/pulse/what-listening-look-facebooks-ai-engine-radim-%C5%99eh%C5%AF%C5%99ek
- why Google open source Tensorflow: https://www.youtube.com/watch?v=Rnm83GqgqPE
7.6
- top 10 NLP con: http://idibon.com/the-top-10-nlp-conferences/
- metricle: https://metricle.com/api
- dark web svm, word2vec: https://homepages.staff.os3.nl/~delaat/rp/2014-2015/p99/report.pdf
- M$ deep learning: http://research.microsoft.com/pubs/246721/NAACL-HLT-2015_tutorial.pdf
- MS sgd trick: http://research.microsoft.com/pubs/192769/tricks-2012.pdf
- 10K class and 10M samples, OVR and SGD is best: https://hal.inria.fr/hal-00835810/PDF/TPAMI_minor_revision.pdf
6.6
- xin rong presentation: https://www.youtube.com/watch?v=D-ekE-Wlcds
- wevi: https://docs.google.com/presentation/d/1yQWN1CDWLzxGeIAvnGgDsIJr5xmy4dB0VmHFKkLiibo/
- stats for hacker pycon2016: https://speakerdeck.com/pycon2016/jake-vanderplas-statistics-for-hackers
- https://medium.com/udacity/this-week-in-machine-learning-3-june-2016-7f089ce984e7#.zbv7h9nyo
- word galaxy: http://www.anthonygarvan.com/wordgalaxy/
- pycon2016: https://github.com/singingwolfboy/build-a-flask-api
- http://burhan.io/flask-web-api-with-firebase/
- http://web.stanford.edu/class/cs224u/materials/cs224u-vsm-overview.pdf
1.6
- lab41: http://www.lab41.org/a-tour-of-sentiment-analysis-techniques-getting-a-baseline-for-sunny-side-up/
- doc2vec at tripadvisor: https://github.com/hellozeyu/An-advisor-for-TripAdvisor
- https://nycdatascience.com/an-advisor-for-tripadvisor/
10 lesson learned from Xavier recap:
- implicit signal beats explicit ones (almost always): clickbait, rating psychology
- your model will learn what you teach it to learn: feature, function, f score
- sup + unsup = life
- everything is ensemble
- model sequences: output of the model is input of others
- FE: reusable, transformable, interpretable, reliable
- ML infra: experimentation phase: easiness, flexibility, reusability. production phase: performance, scalable
- Debugging feature values
- you don't need to distribute ML algo
- DS + ML engineering = perfection
31.5
- pycon2016: https://www.youtube.com/channel/UCwTD5zJbsQGJN75MwbykYNw
- andreas, intro ML/sklearn for DS: https://github.com/amueller/introduction_to_ml_with_python
- Berkeley ds intro: https://data-8.appspot.com/sp16/course
30.5
- dirichlet process: http://stiglerdiet.com/blog/2015/Jul/28/dirichlet-distribution-and-dirichlet-process/
- pycon 2016: https://github.com/justmarkham/pycon-2016-tutorial/
- romance in word2vec: http://www.ghostweather.com/files/word2vecpride/
- topic quality coherence: http://palmetto.aksw.org/palmetto-webapp/
- https://spacy.io/docs
- https://spacy.io/docs/tutorials/twitter-filter
- http://sebastianraschka.com/Articles/2014_naive_bayes_1.html
- https://github.com/justmarkham/pycon-2016-tutorial
29.5
- cry analysis: http://www.robinwe.is/explorations/cry.html
- spacy preprocessing: https://github.com/cemoody/lda2vec/blob/master/lda2vec/preprocess.py
- spacy Tweet: https://spacy.io/docs/tutorials/twitter-filter
- lda2vec: full http://multithreaded.stitchfix.com/blog/2016/05/27/lda2vec/#topic=38&lambda=1&term=
- probalistic approach: http://chirayukong.github.io/infsci2725/resources/09_Probabilistic_Approaches.pdf
- lda curation: https://datawarrior.wordpress.com/2016/04/20/local-and-global-words-and-topics/
- why hdbscan: http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb
- auto ml: http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf
- http://www.kdnuggets.com/2016/05/five-machine-learning-projects-cant-overlook.html
- topic2vec: https://www.cs.cmu.edu/~diyiy/docs/naacl15.pdf
26.5
- http://alexperrier.github.io/jekyll/update/2015/09/04/topic-modeling-of-twitter-followers.html
- http://alexperrier.github.io/jekyll/update/2015/09/16/segmentation_twitter_timelines_lda_vs_lsa.html
- https://begriffs.com/posts/2015-03-10-better-tweets-datascience.html
- https://github.com/alexperrier/datatalks/tree/master/twitter
- https://issuu.com/andriusknispelis/docs/topic_models_-_video
- http://www.aclweb.org/anthology/W15-1526
- https://www.opendatascience.com/blog/dissecting-the-presidential-debates-with-an-nlp-scalpel/
- https://speakerdeck.com/bmabey/visualizing-topic-models
25.5
In summary, here is what I recommend if you plan to use word2vec: choose the right training parameters and training data for word2vec, use avg predictor for query, sentence and paragraph(code here) after picking a dominant word set and apply deep learning on resulted vectors.
===
For SGNS, here is what I believe really happens during the training: If two words appear together, the training will try to increase their cosine similarity. If two words never appear together, the training will reduce their cosine similarity. So if there are a lot of user queries such as “auto insurance” and “car insurance”, then “auto” vector will be similar to “insurance” vector (cosine similarity ~= 0.3) and “car” vector will also be similar to “insurance” vector. Since “insurance”, “loan” and “repair” rarely appear together in the same context, their vectors have small mutual cosine similarity (cosine similarity ~= 0.1). We can treat them as orthogonal to each other and think them as different dimensions. After training is complete, “auto” vector will be very similar to “car” vector (cosine similarity ~= 0.6) because both of them are similar in “insurance” dimension, “loan” dimension and “repair” dimension. This intuition will be useful if you want to better design your training data to meet the goal of your text learning task.
===
for short sentences/phrases, Tomas Mikolov recommends simply adding up individual vector words to get a "sentence vector" (see his recent NIPS slides).
For longer documents, it is an open research question how to derive their representation, so no wonder you're having trouble :)
I like the way word2vec is running (no need to use important hardware to process huge collection of text). It's more usable than LSA or any system which requires a term-document matrix.
Actually LSA requires less structured data (only a bag-of-words matrix, whereas word2vec requires exact word sequences), so there's no fundamental difference in input complexity.
- http://douglasduhaime.com/blog/clustering-semantic-vectors-with-python
- https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-3-more-fun-with-word-vectors
- http://bookworm.benschmidt.org/posts/2015-10-25-Word-Embeddings.html
- http://www.aclweb.org/anthology/W15-1526
- ML model: http://blog.echen.me/2011/04/27/choosing-a-machine-learning-classifier/
24.5
- https://github.com/edburnett/twitter-text-python
- http://eng.kifi.com/from-word2vec-to-doc2vec-an-approach-driven-by-chinese-restaurant-process/
- https://www.insight-centre.org/sites/default/files/publications/14.033_insight-snow14dc-final.pdf
- http://rali.iro.umontreal.ca/rali/sites/default/files/publis/Atefeh_et_al-2013-Computational_Intelligence-2.pdf
TSNE:
- javascript: http://karpathy.github.io/2014/07/02/visualizing-top-tweeps-with-t-sne-in-Javascript/
- http://cs.stanford.edu/people/karpathy/tsnejs/index.html
Conferences:
- word2vec tree: https://github.com/pvthuy/word2vec-visualization
- flask, api, mongo, d3: http://adilmoujahid.com/posts/2015/01/interactive-data-visualization-d3-dc-python-mongodb/
- https://github.com/RaRe-Technologies/movie-plots-by-genre
- wmd: http://vene.ro/blog/word-movers-distance-in-python.html
- word2vec viz: https://ronxin.github.io/wevi/
- news analytics in finance: https://vimeo.com/67901816
- table2vec: http://www.slideshare.net/SparkSummit/using-data-science-to-transform-opentable-into-delgado-das
- data by the bay: http://data.bythebay.io/schedule.html
- pydataberlin: http://pydata.org/berlin2016/
20.5
- scatter with images: https://gist.github.com/lukemetz/be6123c7ee3b366e333a
19.5
- wise 203 classes, vocab = 300k, sample = 64k, test = 34j=k, http://alexanderdyakonov.narod.ru/wise2014-kaggle-Dyakonov.pdf
- yelp review to multi label: food, deal, ambience,... http://www.ics.uci.edu/~vpsaini/
- instagram: http://instagram-engineering.tumblr.com/post/117889701472/emojineering-part-1-machine-learning-for-emoji
- emoji embedding http://www.danielforsyth.me/nba-twitter-emojis-and-word-embeddings/
- tweetmap in websummit event: http://blog.aylien.com/post/133931414053/analyzing-tweets-from-web-summit-2015-semantic
- topic2vec: http://arxiv.org/pdf/1506.08422.pdf
- http://googleresearch.blogspot.com/2016/05/chat-smarter-with-allo.html
- https://en.wikipedia.org/wiki/Limited-memory_BFGS
18.5
-
building data processing at budget: http://www.slideshare.net/GaelVaroquaux/building-a-cuttingedge-data-processing-environment-on-a-budget
-
http://glowingpython.blogspot.com/2014/02/terms-selection-with-chi-square.html
-
which feature selection: http://sebastianraschka.com/faq/docs/feature_sele_categories.html
-
which learning algos: http://sebastianraschka.com/faq/docs/best-ml-algo.html
-
for intepretability use tree: http://sebastianraschka.com/faq/docs/model-selection-in-datascience.html
-
LR vs NB: http://sebastianraschka.com/faq/docs/naive-bayes-vs-logistic-regression.html
-
yelp review classifier: https://github.com/parulsingh/FlaskAppCS194
-
ngsg is not mf yet: https://building-babylon.net/2016/05/12/skipgram-isnt-matrix-factorisation/
-
http://blog.aylien.com/post/133931414053/analyzing-tweets-from-web-summit-2015-semantic
sentifi:
- https://github.com/bdhingra/tweet2vec
- tweet2vec https://arxiv.org/abs/1605.03481
- syntaxnet: https://github.com/tensorflow/models/tree/master/syntaxnet
- hijack compromise user account http://www.icir.org/vern/papers/twitter-compromise.ccs2014.pdf
- user classification: name + loc http://www.cs.jhu.edu/~vandurme/papers/broadly-improving-user-classfication-via-communication-based-name-and-location-clustering-on-twitter.pdf
- chrispot: http://sentiment.christopherpotts.net/tokenizing.html
- https://github.com/cbuntain/TwitterFergusonTeachIn
- mining tweet: https://rawgit.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/master/ipynb/html/Chapter%201%20-%20Mining%20Twitter.html
- NE: https://noisy-text.github.io/pdf/WNUT10.pdf
- tokenizer: http://sentiment.christopherpotts.net/code-data/happyfuntokenizing.py
- twitter tokenizer online: http://sentiment.christopherpotts.net/tokenizing/results/
- cs224u understanding nlp: http://nbviewer.jupyter.org/github/cgpotts/cs224u/
- https://spacy.io/blog/german-model?utm_source=News&utm_campaign=87a64aae50-German_release_newsletter&utm_medium=email&utm_term=0_89ad33e698-87a64aae50-64293797
- jupyter theme: http://sherifsoliman.com/2016/01/11/theming-ipython-jupyter-notebook/
- noisy text need to be normalized: https://noisy-text.github.io/norm-shared-task.html
- understanding user profile/twitter: https://blog.twitter.com/2015/guest-post-understanding-users-through-twitter-data-and-machine-learning
- word2vec with numba: https://d10genes.github.io/blog/2016/05/03/word2vec/
- analyzing text data at Firefox: http://web.stanford.edu/~rjweiss/public_html/MozFest2013/
- pretrained word2vec https://github.com/3Top/word2vec-api
- twitter music word2vec: http://www.netbase.com/blog/understanding-beliebers-word2vec-twitter/
- text + images with CNN: https://www.scribd.com/doc/305710656/Convolutional-Neural-Networks-for-Multimedia-Sentiment-Analysis
- feature pivot: http://www.hpl.hp.com/techreports/2011/HPL-2011-98.pdf
- nlp with cnn: http://www.slideshare.net/devashishshanker/deep-learning-for-natural-language-processing
- event detection http://www.hpl.hp.com/techreports/2011/HPL-2011-98.pdf
- http://www.zdnet.com/article/big-data-what-to-trust-data-science-or-the-bosss-sixth-sense/
- tf is winning: https://medium.com/@mjhirn/tensorflow-wins-89b78b29aafb#.6lebzwbyx
- a vc blog: http://avc.com
- hijacking: http://www.icir.org/vern/papers/twitter-compromise.ccs2014.pdf
- us president prediction: http://www.aioptify.com/predictinguselection.php
- https://thestack.com/world/2015/05/08/three-steps-to-building-a-twitter-driven-trading-bot/
- http://file.scirp.org/pdf/SN_2015070917142293.pdf
- tweet latent attributes: http://boingboing.net/2014/09/01/twitter-uses-an-algorithm-to-f.html
- user gender inference: http://www.aclweb.org/anthology/W14-5408
- https://blog.bufferapp.com/the-5-types-of-tweets-to-keep-your-buffer-full-and-your-followers-engaged
- classifying user latent attributes: http://www.cs.jhu.edu/~delip/smuc.pdf
- http://myownhat.blogspot.com/
- http://bugra.github.io/work/notes/2015-01-17/mining-a-vc/
- NER with w2v, 400M tweet: http://www.fredericgodin.com/software/
http://davidrosenberg.github.io/ml2016/#home
pydatalondon 2016:
- http://www.thetalkingmachines.com
- https://www.youtube.com/user/PyDataTV
- pymc: https://docs.google.com/presentation/d/1QNxSjDHJbFL7vFwQHHheeGmBHEJAo39j28xdObFY6Eo/edit#slide=id.gdfcfebc22_0_118
- https://github.com/springcoil/PyDataLondonTutorial/blob/master/deck-17.pdf
- https://speakerdeck.com/bargava/introduction-to-deep-learning
- https://github.com/rouseguy/intro2deeplearning/
- https://github.com/rouseguy/intro2stats
- https://github.com/kylemcdonald/SmileCNN
- https://github.com/springcoil/PyDataLondonTutorial/blob/master/notebooks/Statistics.ipynb
- http://greenteapress.com/complexity/thinkcomplexity.pdf
- http://matthewearl.github.io/2016/05/06/cnn-anpr/
spotify:
- http://www.slideshare.net/AndySloane/machine-learning-spotify-madison-big-data-meetup
- http://www.slideshare.net/erikbern/music-recommendations-mlconf-2014
lda asyn, auto alpha: http://rare-technologies.com/python-lda-in-gensim-christmas-edition/
mapk: https://github.com/benhamner/Metrics/tree/master/Python/ml_metrics
ilcr2016: https://tensortalk.com/?cat=conference-iclr-2016
l.m.thang
https://github.com/jxieeducation/DIY-Data-Science
http://drivendata.github.io/cookiecutter-data-science/
http://ofey.me/papers/sparse_ijcai16.pdf
Spotify:
- https://github.com/mattdennewitz/playlist-to-vec
- http://wonder.fm/
- https://social.shorthand.com/huntedguy/3CfQA8mj2S/playlist-harvesting
skflow:
- https://medium.com/@ilblackdragon/tensorflow-tutorial-part-1-c559c63c0cb1#.7a7s8tkke
- https://medium.com/@ilblackdragon/tensorflow-tutorial-part-2-9ffe47049c92#.jgxmezy95
- https://medium.com/@ilblackdragon/tensorflow-tutorial-part-3-c5fc0662bc08#.2d22an1xp
- http://terrytangyuan.github.io/2016/03/14/scikit-flow-intro/
- https://libraries.io/github/mhlr/skflow
- https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples
a few useful things to know about ML:
- https://blog.bigml.com/2013/02/15/everything-you-wanted-to-know-about-machine-learning-but-were-too-afraid-to-ask-part-one/
- https://blog.bigml.com/2013/02/21/everything-you-wanted-to-know-about-machine-learning-but-were-too-afraid-to-ask-part-two/
tdb: https://github.com/ericjang/tdb
dask for task parallel, delayed: http://dask.pydata.org/en/latest/examples-tutorials.html
skflow:
- pip install git+git://github.com/tensorflow/skflow.git
- http://www.kdnuggets.com/2016/02/scikit-flow-easy-deep-learning-tensorflow-scikit-learn.html
http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/
https://github.com/andrewt3000/DL4NLP/blob/master/README.md
tf:
- http://terryum.io/ml_applications/2016/04/25/TF-Code-Structure/
- http://www.slideshare.net/tw_dsconf/tensorflow-tutorial
tf chatbot: https://github.com/nicolas-ivanov/tf_seq2seq_chatbot
- deep inversion : https://github.com/TaddyLab/gensim/blob/deepir/docs/notebooks/deepir.ipynb
- encoder decoder with attention: http://arxiv.org/pdf/1512.01712v1.pdf
- keras tut: http://web.cs.hacettepe.edu.tr/~aykut/classes/spring2016/bil722/tutorials/keras.pdf
Bayesian Opt: https://github.com/fmfn/BayesianOptimization/blob/master/examples/visualization.ipynb
click-o-tron rnn: http://clickotron.com auto generated headline clickbait: https://larseidnes.com/2015/10/13/auto-generating-clickbait-with-recurrent-neural-networks/
http://blog.computationalcomplexity.org/2016/04/the-master-algorithm.html http://jyotiska.github.io/blog/posts/python_libraries.html
LSTM: http://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/
CS224d:
- TF intro: http://cs224d.stanford.edu/lectures/CS224d-Lecture7.pdf
- RNN: http://cs224d.stanford.edu/lectures/CS224d-Lecture8.pdf
Sota of sa, mikolo and me :)
Thang M. L: http://web.stanford.edu/class/cs224n/handouts/cs224n-lecture16-nmt.pdf
CS224d reports:
- classify online forum answer/non-answer: https://cs224d.stanford.edu/reports/AbajianAaron.pdf
- gender classification: https://cs224d.stanford.edu/reports/BartleAric.pdf
- job prediction: https://cs224d.stanford.edu/reports/BoucherEric.pdf
- text sum: https://cs224d.stanford.edu/reports/ChaiElaina.pdf
- email spam: https://cs224d.stanford.edu/reports/EugeneLouis.pdf
- jp2en: https://cs224d.stanford.edu/reports/GreensteinEric.pdf
- improve PV: https://cs224d.stanford.edu/reports/HongSeokho.pdf
- twitter sa: https://cs224d.stanford.edu/reports/YuanYe.pdf
- yelp sa: https://cs224d.stanford.edu/reports/YuApril.pdf
- author detector: https://cs224d.stanford.edu/reports/YaoLeon.pdf
- IMDB to Yelp: https://cs224d.stanford.edu/reports/XingMargaret.pdf
- Reddit: https://cs224d.stanford.edu/reports/TingJason.pdf
- Quora: https://cs224d.stanford.edu/reports/JindalPranav.pdf
QA in keras:
- https://github.com/avisingh599/visual-qa/blob/master/scripts/trainMLP.py
- https://avisingh599.github.io/deeplearning/visual-qa/
Chinese LSTM + word2vec:
- https://github.com/taozhijiang/chinese_nlp/blob/master/DL_python/dl_segment_v2.py
- https://github.com/taozhijiang/chinese_nlp
DL with SA: https://cs224d.stanford.edu/reports/HongJames.pdf
MAB:
- mab book: http://pdf.th7.cn/down/files/1312/bandit_algorithms_for_website_optimization.pdf
- yhat: http://blog.yhat.com/posts/the-beer-bandit.html
- test significance with AB, conversation rate opt with MAB: https://vwo.com/blog/multi-armed-bandit-algorithm/
- when to use multiarmed bandits: http://conversionxl.com/bandit-tests/
- multibandit: http://stevehanov.ca/blog/index.php?id=132
cnn nudity detection: http://blog.clarifai.com/what-convolutional-neural-networks-see-at-when-they-see-nudity/#.VxbdB0xcSko
sigopt: https://github.com/sigopt/sigopt_sklearn
first contact with TF: http://www.jorditorres.org/first-contact-with-tensorflow/
eval of ML using A/B or multibandit: http://blog.dato.com/how-to-evaluate-machine-learning-models-the-pitfalls-of-ab-testing
how to make mistakes in Python: www.oreilly.com/programming/free/files/how-to-make-mistakes-in-python.pdf
keras tut: https://uwaterloo.ca/data-science/sites/ca.data-science/files/uploads/files/keras_tutorial.pdf
Ogrisel word embedding: https://speakerd.s3.amazonaws.com/presentations/31f18ad0522c0132b9b662e7bb117668/Word_Embeddings.pdf
Tensorflow whitepaper: http://download.tensorflow.org/paper/whitepaper2015.pdf
Arimo distributed tensorflow: https://arimo.com/machine-learning/deep-learning/2016/arimo-distributed-tensorflow-on-spark/
Best ever word2vec in code: http://nbviewer.jupyter.org/github/fbkarsdorp/doc2vec/blob/master/doc2vec.ipynb
TF japanese: http://www.slideshare.net/yutakashino/tensorflow-white-paper
TF tut101: https://github.com/aymericdamien/TensorFlow-Examples
Jeff Dean: http://learningsys.org/slides/NIPS-Learning-Systems-Workshop-TensorFlow-Jeff-Dean.pdf DL: http://www.thoughtly.co/blog/deep-learning-lesson-1/ Distributed TF: https://www.tensorflow.org/versions/r0.8/how_tos/distributed/index.html
playground: http://playground.tensorflow.org/
Hoang Duong blog: http://hduongtrong.github.io/ Word2vec short explanation: http://hduongtrong.github.io/2015/11/20/word2vec/
ForestSpy: https://github.com/jvns/forestspy/blob/master/inspecting%20random%20forest%20models.ipynb
- keras for mnist: https://github.com/wxs/keras-mnist-tutorial/blob/master/MNIST%20in%20Keras.ipynb
- lasagne installation https://martin-thoma.com/lasagne-for-python-newbies/
Netflix:
- http://www.wired.com/2012/04/netflix-prize-costs/
- http://www.wired.com/2009/09/bellkors-pragmatic-chaos-wins-1-million-netflix-prize/
Lessons learned
- http://machinelearningmastery.com/lessons-learned-building-machine-learning-systems/
- http://techjaw.com/2015/02/11/10-machine-learning-lessons-harnessed-by-netflix/
- https://medium.com/@xamat/10-more-lessons-learned-from-building-real-life-ml-systems-part-i-b309cafc7b5e#.klowhfq10
WMD:
- word mover distance: https://github.com/mkusner/wmd
- gensim wmd: https://speakerdeck.com/tmylk/same-content-different-words
Hanoi trip:
-
tensorflow scan: learn the cum sum https://nbviewer.jupyter.org/github/rdipietro/tensorflow-notebooks/blob/master/tensorflow_scan_examples/tensorflow_scan_examples.ipynb
-
stacking: http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/
-
Learn and think like human: http://arxiv.org/pdf/1604.00289v1.pdf
-
predictive modeling + AI: https://speakerd.s3.amazonaws.com/presentations/30ad41b99258471f9485118f904f8cfb/predictive_modeling_and_deep_learning.pdf
-
sklearn vs tf: https://github.com/rasbt/python-machine-learning-book/blob/master/faq/tensorflow-vs-scikitlearn.md
-
advances in DL for NLP: http://cs.nyu.edu/~zaremba/docs/Advances%20in%20deep%20learning%20for%20NLP.pdf
-
Xavier, 10 lessons learned: https://medium.com/@xamat/10-more-lessons-learned-from-building-real-life-ml-systems-part-i-b309cafc7b5e#.klowhfq10
-
pizza analysis: http://yoavz.com/potd/
-
450 hours in data science: http://studiy.co/path/data-science/
-
LR + SGD + FM: https://gist.github.com/kalaidin/9ea737ad771fcf073e57
-
libFM: http://www.ics.uci.edu/~smyth/courses/cs277/papers/factorization_machines_with_libFM.pdf
-
intro FM: http://www.slideshare.net/0x001/intro-to-factorization-machines
-
fastFM: https://github.com/ibayer/fastFM
-
winning data science competition: https://speakerdeck.com/datasciencela/jeong-yoon-lee-winning-data-science-competitions-data-science-meetup-oct-2015
-
python for data analyst: https://www.kevinsheppard.com/images/0/09/Python_introduction.pdf
-
risk modeling: https://risk-engineering.org/static/PDF/slides-stat-modelling.pdf
-
mlss2014: http://www.mlss2014.com/materials.html
-
xavier: https://www.slideshare.net/slideshow/embed_code/key/gt6HuUzZ4Z7flf
-
Machine Intelligence 2.0: https://cdn-images-1.medium.com/max/2000/1*A9exqeQ69XjjSJgMyDEo6Q.jpeg
-
Quora - all about data scientits: https://www.quora.com/What-are-the-best-blogs-for-data-scientists-to-read
-
World of though vector: http://www.pamitc.org/cvpr15/files/lecun-20150610-cvpr-keynote.pdf
-
newbie nlp lab: https://github.com/piskvorky/topic_modeling_tutorial/
-
why and when log-log is used: http://www.forbes.com/sites/naomirobbins/2012/01/19/when-should-i-use-logarithmic-scales-in-my-charts-and-graphs/#41c6dc0c3cd8
-
lzma: https://parezcoydigo.wordpress.com/2011/10/09/clustering-with-compression-for-the-historian/
-
Tom Vincent: http://insightdatascience.com/blog/tom_vincent_qanda.html
-
Normalized Compression Distance: http://tamediadigital.ch/2016/03/20/normalized-compression-distance-a-simple-and-useful-method-for-text-clustering-2/
-
Yoav Goldberg: https://www.youtube.com/watch?v=xw5HL5h1wxY
-
Sklearn production on Dato: https://www.youtube.com/watch?v=AwjeRg1u5VI
VinhKhuc:
- how many k for CV: k = N e.g. LOOCV http://vinhkhuc.github.io/2015/03/01/how-many-folds-for-cross-validation.html
- backprop http://vinhkhuc.github.io/2015/03/29/backpropagation.html
- qa bAbI task: https://github.com/vinhkhuc/MemN2N-babi-python
- lstm/rnn: http://vinhkhuc.github.io/2015/11/19/rnn-lstm.html
RS:
- https://code.facebook.com/posts/861999383875667/recommending-items-to-more-than-a-billion-people/
- http://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-collaborative-filtering/
Data science bootcamp: https://cambridgecoding.com/datascience-bootcamp#outline
CambridgeCoding NLP:
- https://drive.google.com/file/d/0B_ZOKLUe_XPaNVFHM3M4dHRzV28/view?pli=1
- http://blog.cambridgecoding.com/2016/03/24/misleading-modelling-overfitting-cross-validation-and-the-bias-variance-trade-off/
Annoy:
- Annoy, Luigi: http://erikbern.com/, https://www.hakkalabs.co/articles/approximate-nearest-neighbors-vector-models
- LSH: https://speakerdeck.com/maciejkula/locality-sensitive-hashing-at-lyst
- http://www.slideshare.net/erikbern/approximate-nearest-neighbor-methods-and-vector-models-nyc-ml-meetup
- https://github.com/spotify/annoy
RPForest: https://github.com/lyst/rpforest LightFM: https://github.com/lyst/lightfm Secure because of math: https://www.youtube.com/watch?v=TYVCVzEJhhQ Talking machines: http://www.thetalkingmachines.com/ Dive into DS: https://github.com/rasbt/dive-into-machine-learning
DS process: https://www.oreilly.com/ideas/building-a-high-throughput-data-science-machine Friendship paradox: https://vuhavan.wordpress.com/2016/03/25/ban-ban-ban-nhieu-hon-ban-ban/
AB test:
- notebook: https://github.com/Volodymyrk/stats-testing-in-python/blob/master/01%20-%20Single%20Sample%20tests%20for%20Mean.ipynb
- https://medium.com/@rchang/my-two-year-journey-as-a-data-scientist-at-twitter-f0c13298aee6#.t1h9ouwpg
- http://multithreaded.stitchfix.com/blog/2015/05/26/significant-sample/
- http://nerds.airbnb.com/experiments-at-airbnb/
- https://www.quora.com/When-should-A-B-testing-not-be-trusted-to-make-decisions/answer/Edwin-Chen-1?srid=sL8&share=1
EMNLP 2015:
- semantic sim of embedding: https://www.cs.cmu.edu/~ark/EMNLP-2015/tutorials/34/34_OptionalAttachment.pdf
- social text analysis: https://www.cs.cmu.edu/~ark/EMNLP-2015/tutorials/3/3_OptionalAttachment.pdf
- personality research in NLP: https://www.cs.cmu.edu/~ark/EMNLP-2015/tutorials/2/2_OptionalAttachment.pdf
To read:
- https://github.com/rasbt/algorithms_in_ipython_notebooks
- https://www.blackhat.com/docs/webcast/02192015-secure-because-math.pdf
- http://nirvacana.com/thoughts/becoming-a-data-scientist/
- http://nbviewer.jupyter.org/github/jmsteinw/Notebooks/blob/master/IndeedJobs.ipynb
- http://www.john-foreman.com/data-smart-book.html
- http://www.thetalkingmachines.com/blog/2015/4/23/starting-simple-and-machine-learning-in-meds
- https://github.com/justmarkham/DAT8
- https://github.com/donnemartin/data-science-ipython-notebooks
Idols:
- Alex Pinto: MLSec
- Peadar Coyle: https://peadarcoyle.wordpress.com/, https://github.com/springcoil/pydataamsterdamkeynote, http://slides.com/springcoil/dataproducts-11#/27, https://medium.com/@peadarcoyle/three-things-i-wish-i-knew-earlier-about-machine-learning-54cb0d23ca29#.uc6e049rl
- Radmim: gensim
- Delip Rao: http://deliprao.com/archives/129
- Alex: http://alexanderdyakonov.narod.ru/engcontests.htm
- Yorav: https://www.cs.bgu.ac.il/~yoavg/uni/
- Andreij: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
- Sebastian: http://www.kdnuggets.com/2016/02/conversation-data-scientist-sebastian-raschka-podcast.html
- Joel Grus: http://joelgrus.com/
- Bugra: http://bugra.github.io/
IPython/Jupyter:
LSTM:
- RNN for music: http://erikbern.com/2014/06/28/recurrent-neural-networks-for-collaborative-filtering/
- skflow: https://github.com/tensorflow/skflow/tree/master/examples
- dropout: http://arxiv.org/abs/1409.2329
- seq2seq: http://arxiv.org/abs/1409.3215
- simple char rnn: https://gist.github.com/karpathy/d4dee566867f8291f086
- https://www.tensorflow.org/versions/r0.7/tutorials/recurrent/index.html#the-model
- http://colah.github.io/posts/2015-08-Understanding-LSTMs/
RNN:
- https://www.tensorflow.org/versions/r0.7/tutorials/recurrent/index.html
- Char RNN: http://nbviewer.jupyter.org/gist/yoavg/d76121dfde2618422139
- http://karpathy.github.io/2015/05/21/rnn-effectiveness/
- http://karpathy.github.io/neuralnets/
Unicode:
- ascii fix: http://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror-ascii-codec-cant-decode-byte
- http://nedbatchelder.com/text/unipain/unipain.html#45
EVENTS:
- April 8-10 2016: PyData Madrid
- April 15-17 2016: PyData Florence
- May 6-8 2016: PyData London hosted by Bloomberg
- May 20-21 2016: PyData Berlin
- September 14-16 2016: PyData Carolinas hosted by IBM
- October 7-9 2016: PyData DC hosted by Capital One
- November 28-30 2016: PyData Cologne
Other Conference Dates Coming Soon!
- PyData Chicago
- PyData NYC
- PyData Paris
- PyData Silicon Valley
- pydata amsterdam: http://pydata.org/amsterdam2016/schedule/ https://speakerdeck.com/maciejkula/hybrid-recommender-systems-at-pydata-amsterdam-2016
- gcp 23-24 March
- pycon sg: June 23-25
- emnlp: june, austin, us
- pydata
QUOTES:
- My name is Sherlock Homes. It is my business to know what other people dont know.
- Take the first step in faith. You don't have to see the whole staircase, just take the first step. [M.L.King. Jr]
- "Data data data" he cried impatiently. I can't make bricks without clay. [Arthur Donan Doyle]
STATS:
BOOKS:
- http://shop.oreilly.com/product/0636920033400.do
- https://web.stanford.edu/~hastie/StatLearnSparsity_files/SLS_corrected_1.4.16.pdf
- https://leanpub.com/interviewswithdatascientists
CLUSTER:
- distance: http://stackoverflow.com/questions/22433884/python-gensim-how-to-calculate-document-similarity-using-the-lda-model#answer-22756647
- hac with lsi: https://groups.google.com/forum/#!topic/gensim/0Ev8Okf3MCs
- clustering eva: http://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation
- Silhouette analysis: http://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html
- https://groups.google.com/forum/#!msg/gensim/ZxauGgh9Vqs/prIMalR8LbgJ
- http://www.site.uottawa.ca/~diana/csi5180/TextClustering.pdf
- http://stackoverflow.com/questions/17537722/better-text-documents-clustering-than-tf-idf-and-cosine-similarity
- http://www.naftaliharris.com/blog/visualizing-dbscan-clustering/
- 30 day indexed: http://googlenewsblog.blogspot.com/2008/05/keeping-good-news-stories-together-just.html
- http://www.mondaynote.com/2013/02/24/google-news-the-secret-sauce/
- http://searchengineland.com/google-news-ranking-stories-30424
- http://nsuworks.nova.edu/cgi/viewcontent.cgi?article=1051&context=gscis_etd
- https://github.com/lmcinnes/hdbscan
- http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations-v0.6.ipynb
- http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations%202D%20v0.6.ipynb
- http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb
EMBEDDING:
- https://quomodocumque.wordpress.com/2016/01/15/messing-around-with-word2vec/
- http://www.offconvex.org/2016/02/14/word-embeddings-2/
- improving sem embedding words rep: https://levyomer.wordpress.com/2015/03/30/improving-distributional-similarity-with-lessons-learned-from-word-embeddings/
- whiskey: http://wrec.herokuapp.com/methodology
- lda: topic eva: http://radimrehurek.com/topic_modeling_tutorial/2%20-%20Topic%20Modeling.html
- lda2vec: http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec-57135994
- http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb
- text2vec: http://dsnotes.com/articles/glove-enwiki
- Swivel Submatrix Wise Vector Embedding Learner http://arxiv.org/pdf/1602.02215v1.pdf
- https://sense2vec.spacy.io/?natural_language_processing%7CNOUN
Linux:
BENCHMARK:
- keras vs theano vs tensorflow: https://www.reddit.com/r/MachineLearning/comments/462p41/pros_and_cons_of_keras_vs_lasagne_for_deep/
- http://felixlaumon.github.io/2015/01/08/kaggle-right-whale.html
- https://github.com/zer0n/deepframeworks/blob/master/README.md
- soumith/convnet-benchmarks#66
- https://github.com/soumith/convnet-benchmarks/blob/master/README.md
- https://github.com/lmcinnes/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations.ipynb
- https://github.com/szilard/benchm-ml
- word2vec: http://rare-technologies.com/parallelizing-word2vec-in-python/
DIY:
- Genenate writing piece: https://www.reddit.com/r/MachineLearning/comments/4728e1/best_method_to_generate_prose_in_the_style_of_a/
- Neural talk: describe image https://github.com/ryankiros/neural-storyteller
- https://github.com/csaid/polished_notebooks/blob/master/notebook_polished.ipynb
- https://github.com/tensorflow/skflow
- https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/udacity
- https://jmetzen.github.io/2015-01-29/ml_advice.html
- Lasagne Python: https://martin-thoma.com/lasagne-for-python-newbies/
- https://github.com/cemoody/lda2vec
- http://blog.christianperone.com/2016/01/voynich-manuscript-word-vectors-and-t-sne-visualization-of-some-patterns/
- http://cherokee-project.com/doc/basics_installation_osx.html
- https://github.com/joelgrus/spot-it/tree/master/python
- sentiment with rasp: http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch08/ch08.ipynb
- solr + w2v: http://www.slideshare.net/lucidworks/implementing-conceptual-search-in-solr-using-lsa-and-word2vec-presented-by-simon-hughes-dicecom
- adaboost: http://scikit-learn.org/stable/auto_examples/ensemble/plot_adaboost_twoclass.html
- xgboost with sklearn: https://github.com/dmlc/xgboost/blob/master/demo/guide-python/sklearn_evals_result.py
- https://github.com/google/deepdream/blob/master/dream.ipynb
- https://github.com/bugra/l1/blob/master/l1/tf.py
- https://github.com/fchollet/keras
- https://github.com/danielfrg/tsne
- https://predictors.ai/#/p/Iris_flower_classifier
- https://github.com/anishathalye/neural-style
- https://github.com/sujitpal/statlearning-notebooks/tree/master/
- https://github.com/avisingh599/visual-qa
- http://avisingh599.github.io/deeplearning/visual-qa/
- http://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/
- http://sebastianraschka.com/faq/docs/pca-scaling.html
- Deepviz: https://github.com/yosinski/deep-visualization-toolbox
- Movement analysis: https://github.com/la3lma/movement-analysis
- https://github.com/muatik/flask-profiler
- https://github.com/mila-udem/fuel
- https://realpython.com/blog/python/vim-and-python-a-match-made-in-heaven
- https://github.com/wojciechz/learning_to_execute
- https://github.com/karpathy/char-rnn
- https://gist.github.com/karpathy/d4dee566867f8291f086
Products:
- Apache spark + sklearn: http://www.slideshare.net/DmitriBabaev1/data-science-on-big-data-pragmatic-approach-53227809
- http://googleresearch.blogspot.com.br/2016/02/running-your-models-in-production-with.html
- https://sense2vec.spacy.io/?natural_language_processing%7CNOUN
- http://visualqa.csail.mit.edu/
- http://cs.stanford.edu/people/karpathy/nips2015/
- http://www.writelike.ml/survey/69
- http://metamind.io/
Full stack:
- https://github.com/lensacom/sparkit-learn
- https://github.com/japerk/nltk-trainer
- https://github.com/nliu86/word2vec-doc2vec
- http://claymcleod.github.io/papers/distributed-dnn/paper.html
- https://github.com/andrewt3000/DL4NLP/blob/master/README.md#deep-learning-for-nlp-resources
Must seen:
- sentiment at FB: https://www.youtube.com/watch?v=y3ZTKFZ-1QQ
- rework dl summit 2016, Rnn in action with Karpathy: https://www.youtube.com/watch?v=qPcCk1V1JO8
- fake data scientist: http://www.kdnuggets.com/2016/01/20-questions-to-detect-fake-data-scientists.html
- resume tips: http://www.kdnuggets.com/2016/01/data-science-resume-tips-guidelines.html
- chartjunk: http://speakingppt.com/2011/05/09/does-chartjunk-really-trash-your-graphs-4-discoveries-from-research/
- http://svail.github.io/mandarin/
- https://www.youtube.com/watch?v=UAq961jQjYg
- https://class.coursera.org/nlp/lecture
- https://docs.google.com/presentation/d/1eI60SL3UxtWfr9ktrv48-pcIkk4S7JiDmeXGCyyGhCs/edit?pref=2&pli=1#slide=id.g5bd0df450_0_609
- http://www.pyvideo.org/video/3590/how-to-get-data-science-models-into-production-on
- Large scale news clutering: http://publications.lib.chalmers.se/records/fulltext/179841/179841.pdf
- tripadvisor: http://engineering.tripadvisor.com/using-nlp-to-find-interesting-collections-of-hotels/
- http://www.slideshare.net/MoscowDataFest/df1-dmc-trophimov-tips-tricks-and-usecases-of-ensembling-in-practice
- https://github.com/jeongyoonlee?tab=repositories
- http://www.csie.ntu.edu.tw/~r01922136/kaggle-2014-criteo.pdf
- https://github.com/diefimov/avito_context_click_2015
- https://github.com/aguschin/kaggle
- https://speakerd.s3.amazonaws.com/presentations/844fef54b1d34b1aaae951123c59a0e6/Winning_Data_Science_Competitions_-_Distributed.pdf
- https://www.youtube.com/watch?v=ClAZQI_B4t8
- http://i.imgur.com/GfWipUH.gif
- http://www.slideshare.net/AlexanderKorbonits/deep-learning-with-python-pydata-seattle-2015
- https://www.youtube.com/watch?v=LjmWcgmJqVE
- https://www.youtube.com/watch?v=04ev55WnvSg&feature=youtu.be
- https://www.youtube.com/watch?v=wTp3P2UnTfQ
- http://cs.stanford.edu/people/karpathy/nips2015/
- https://www.youtube.com/watch?v=B8J4uefCQMc
- https://www.youtube.com/watch?v=pXhcPJK5cMc&feature=youtu.be
- http://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html
- https://www.youtube.com/watch?v=-yX1SYeDHbg&feature=youtu.be&t=2503
- https://www.youtube.com/watch?v=He4t7Zekob0
- https://skillsmatter.com/skillscasts/6611-visualizing-and-understanding-recurrent-networks
- http://videolectures.net/icml2015_lille/
- https://github.com/s16h/py-must-watch
- http://videolectures.net/deeplearning2015_montreal/
- http://www.nyas.org/MediaPlayer.aspx?mid=1212c782-d18f-43fc-a108-ddf3fe452d9c
- 5 tribes of ML, Pedro Domingos: https://www.youtube.com/watch?v=UPsYGzln-Ys
- http://www.trivedigaurav.com/blog/quoc-les-lectures-on-deep-learning/?owa_referral=pitt&owa_source=~gtrivedi/blog/quoc-les-lectures-on-deep-learning/
- https://github.com/yosinski/deep-visualization-toolbox
- https://www.youtube.com/watch?v=_KoWTD8T45Q&index=13&list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH
Must read:
- spotify collab: http://erikbern.com/2014/06/28/recurrent-neural-networks-for-collaborative-filtering/
- build team: http://erikbern.com/2014/06/08/how-to-build-up-a-data-team-everything-i-ever-learned-about-recruiting/
- https://gist.github.com/friso/df439c3f2420eef49ecf
- http://www.thewire.com/technology/2011/07/new-twitter-alogrithm-could-out-dudes-pretending-be-lesbians/40451/
- http://www.fastcompany.com/1769217/you-cant-keep-your-secrets-twitter
- http://deliprao.com/archives/129
- why cosine: http://arxiv.org/abs/1508.02297, http://arxiv.org/abs/1512.00765
- lab41, to vec or not to vec? http://www.lab41.org/anything2vec/
- Learn to Rank LTR, wordrank: http://deliprao.com/archives/124
- http://www.aclweb.org/anthology/Q15-1016
- baboons can learn? https://www.cs.bgu.ac.il/~yoavg/uni/bloglike/baboons.html
- NLP 2 DL: http://blog.cambridgecoding.com/2016/02/22/natural-language-processing-meets-deep-learning/
- Context LSTM: http://arxiv.org/abs/1602.06291
- https://www.a1k0n.net/spotify/ml-madison/#/1
- http://yaroslavvb.blogspot.com/
- https://spacy.io/blog/sense2vec-with-spacy
- http://deliprao.com/archives/118
- Limits of LM: perplexity RNN 30, ensembling 21 http://arxiv.org/pdf/1602.02410v1.pdf
- what questions can data answer: http://www.kdnuggets.com/2016/01/questions-data-science-answer.html
- 7 mistakes to avoid: http://www.kdnuggets.com/2016/01/7-common-data-science-mistakes.html
- https://alexanderdyakonov.wordpress.com/2016/01/27/%D0%B0%D0%BB%D0%B5%D0%BA%D1%81%D0%B0%D0%BD%D0%B4%D1%80-%D0%B3%D1%83%D1%89%D0%B8%D0%BD/#more-1413
- Lasagne: https://github.com/christophebourguignat/notebooks/blob/master/Tuning%20Neural%20Networks.ipynb
- calib: https://github.com/christophebourguignat/notebooks/blob/master/Calibration.ipynb
- https://medium.com/@chris_bour/6-tricks-i-learned-from-the-otto-kaggle-challenge-a9299378cd61#.8a6a0qpqn
- http://www.technologyreview.com/view/541356/king-man-woman-queen-the-marvelous-mathematics-of-computational-linguistics/
- http://blog.talla.com/2016/01/how-we-approached-the-allen-a-i-challenge-on-kaggle/
- https://medium.com/@xamat/10-more-lessons-learned-from-building-real-life-machine-learning-systems-part-ii-93fe7008fa9#.8caexkms3
- C in svm: http://stats.stackexchange.com/questions/31066/what-is-the-influence-of-c-in-svms-with-linear-kernel
- https://github.com/joelgrus/stupid-itertools-tricks-pydata
- https://www.reddit.com/r/MachineLearning/comments/4020ek/state_of_the_art_dec2015_natural_language/
- https://kaggle2.blob.core.windows.net/forum-message-attachments/60041/1813/TradeshiftTextClassification.pdf?sv=2012-02-12&se=2016-01-08T03%3A36%3A57Z&sr=b&sp=r&sig=miL1MV7BxgdDh%2BcA%2BYVxedVrr%2BolasysRCNL6dJGfmw%3D
- http://engineeringblog.yelp.com/2015/09/automatically-categorizing-yelp-businesses.html
- http://pmarchive.com/
- http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/
- http://www.slideshare.net/tekproxy/tetcon-2016
- http://blog.mikiobraun.de/2015/03/three-things-about-data-science.html
- http://arxiv.org/pdf/1507.07998v1.pdf
- http://cjauvin.blogspot.com/2014/06/dbscan-blues.html
- globe: http://developers.lyst.com/2014/11/11/word-embeddings-for-fashion/
- duplicate detection: https://moz.com/devblog/near-duplicate-detection/
- http://engineering.tripadvisor.com/using-nlp-to-find-interesting-collections-of-hotels/
- http://blog.kaggle.com/2015/06/09/otto-product-classification-winners-interview-2nd-place-alexander-guschin/
- http://www.machinelearning.ru/wiki/index.php
- https://github.com/cs109/2015/tree/master/Lectures
- https://alexanderdyakonov.wordpress.com/
- http://alexanderdyakonov.narod.ru/english.htm
- https://www.kaggle.com/c/otto-group-product-classification-challenge/forums/t/14335/1st-place-winner-solution-gilberto-titericz-stanislav-semenov/79598#post79598
- https://plus.google.com/communities/107785538899595981479/stream/617c2a2c-e04d-448b-87fe-e9d8a8d657fc
- https://plus.google.com/comm�unities/107785538899595981479
- https://www.quora.com/topic/Machine-Learning
- http://www.slideshare.net/OwenZhang2/tips-for-data-science-competitions?related=1
- https://github.com/nliu86/word2vec-doc2vec - Word2vec recommendation
- http://blog.kaggle.com/2015/05/07/profiling-top-kagglers-kazanovacurrently-2-in-the-world/
- http://blog.kaggle.com/2015/12/03/dato-winners-interview-1st-place-mad-professors/
- http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/
- https://playground.pandorabots.com/en/tutorial/
- http://www.iro.umontreal.ca/~bengioy/talks/DL-Tutorial-NIPS2015.pdf
- http://dswalter.github.io/blog/machine-learnings-first-cheating-scandal/
- http://arxiv.org/abs/1512.02595
- https://www.reddit.com/r/MachineLearning/comments/3vxal4/what_happened_at_nips_2015_deep_learning_session/
- http://research.microsoft.com/en-us/um/people/jfgao/paper/2013/cikm2013_dssm_fullversion.pdf
- https://daoudclarke.github.io/guide.pdf
- https://docs.google.com/document/d/1ydIujJ7ETSZ688RGfU5IMJJsbxAi-kRl8czSwpti15s/edit?pli=1
- http://cs231n.github.io/
- https://levyomer.files.wordpress.com/2015/03/improving-distributional-similarity-tacl-2015.pdf
- http://dsnotes.com/blog/text2vec/2015/12/01/glove-enwiki/
- https://github.com/maciejkula/glove-python
- http://textminingonline.com/getting-started-with-word2vec-and-glove-in-python
- http://colah.github.io/posts/2015-01-Visualizing-Representations/
- http://arxiv.org/pdf/1511.08198v1.pdf
- http://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pdf
- http://arxiv.org/pdf/1510.00726v1.pdf
- http://arxiv.org/pdf/1511.07916v1.pdf
- http://vision.stanford.edu/teaching/cs231n/
- http://www.datascienceweekly.org/data-scientist-interviews/training-deep-learning-models-browser-andrej-karpathy-interview
- http://www.slideshare.net/SebastianRaschka/nextgen-talk-022015
- http://nadbordrozd.github.io/interviews/
- https://www.countbayesie.com/blog/2015/11/21/the-black-friday-puzzle-understanding-markov-chains
- http://goodfeli.github.io/dlbook/
- http://www.slideshare.net/packtpub/python-machine-learning-sample-chapter
- https://github.com/svaksha/pythonidae
- http://www.datarobot.com/blog/statistical-learning-in-python/
- http://karpathy.github.io/2015/11/14/ai/
- http://www.slideshare.net/xamat/10-more-lessons-learned-from-building-machine-learning-systems
- http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.pdf
- http://mlwave.com/kaggle-ensembling-guide/
- http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
- http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00239
- http://www.kentran.net/2014/12/challenges-in-machine-learning-practice.html
- http://cs.stanford.edu/people/karpathy/convnetjs/
- http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/
- https://github.com/Lab41/sunny-side-up/wiki/Learning-Resources-for-NLP,-Sentiment-Analysis,-and-Deep-Learning
- http://static.googleusercontent.com/media/research.google.com/en//people/jeff/BayLearn2015.pdf
- http://googleresearch.blogspot.co.uk/2015/11/computer-respond-to-this-email.html
- https://drive.google.com/file/d/0B7XkCwpI5KDYRWRnd1RzWXQ2TWc/edit?pli=1
- http://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf
- http://jiwonkim.org/awesome-rnn/
- http://deeplearning4j.org/thoughtvectors
- http://www.marekrei.com/blog/26-things-i-learned-in-the-deep-learning-summer-school/
- https://aclweb.org/anthology/P/P15/P15-1150.pdf
- http://nlp.stanford.edu/projects/glove/
- http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
- http://u.cs.biu.ac.il/~yogo/nnlp.pdf
- http://www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf
- http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- http://101.datascience.community/
- http://edmiston.id.au/2015/05/12/practical-experience-with-word2vec/
- https://www.reddit.com/r/MachineLearning/
- https://trello.com/b/rbpEfMld/data-science
- http://nlp.stanford.edu/events/illvi2014/papers/sievert-illvi2014.pdf
- https://github.com/rasbt/python-machine-learning-book
- http://iamtrask.github.io/2015/07/27/python-network-part2/
- http://iamtrask.github.io/2015/07/12/basic-python-network/
- http://www.eecs.tufts.edu/~dsculley/papers/Detecting_Adversarial_Advertisements.pdf
- http://blog.echen.me/2011/04/27/choosing-a-machine-learning-classifier/
- https://github.com/fabianp/minirank/blob/master/notebooks/pairwise_transform.ipynb
- http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
- https://www.dataquest.io/blog/python-vs-r/
- https://www.youtube.com/watch?v=tGxW2BzC_DU&index=4&list=PLykRMO7ZuHwP5cWnbEmP_mUIVgzd5DZgH -http://nbviewer.ipython.org/github/bmabey/pyLDAvis/blob/master/notebooks/pyLDAvis_overview.ipynb#topic=1&lambda=0.5&term=
Curated:
- http://www.slideshare.net/erikbern/music-recommendations-mlconf-2014
- http://multithreaded.stitchfix.com/blog/2016/02/04/computer-vision-state-of-the-art/
- lift analysis: http://blog.datalifebalance.com/lift-charts-a-data-scientists-secret-weapon/
- https://www.blackhat.com/docs/webcast/02192015-secure-because-math.pdf
- http://hunch.net/?p=22
- http://blog.siftnlp.com/natural-language-processing-blogs
- http://breakthroughanalysis.com/2016/02/18/proxem/
- https://medium.com/@itsaguytalking/the-role-of-statistical-significance-in-growth-hacking-c80648fde2eb#.u5b1evd3y
- https://medium.com/google-developers/why-won-t-this-work-coding-angry-for-fun-and-profit-1ef38a2b7196#.jxaah081z
- https://www.reddit.com/r/MachineLearning/comments/47jano/whats_so_great_about_lstm/
- hyperword: https://bitbucket.org/omerlevy/hyperwords/src
- http://neuralnetworksanddeeplearning.com/chap1.html
- http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
- https://medium.com/learning-new-stuff/how-to-learn-neural-networks-758b78f2736e#.rzgr2r1j1
- http://blog.keras.io/how-convolutional-neural-networks-see-the-world.html
- http://www.inference.vc/deep-learning-is-easy/
- https://github.com/Microsoft/CNTK
- http://www.jmlr.org/papers/volume15/claesen14a/claesen14a.pdf
- http://www.jmlr.org/papers/volume15/nandan14a/nandan14a.pdf
- http://www.jmlr.org/papers/volume15/dubout14a/dubout14a.pdf
- http://www.jmlr.org/papers/volume15/saberian14a/saberian14a.pdf
- https://github.com/ianozsvald/data_science_delivered
- http://eng.datafox.co/general/2015/04/17/keyword-similarities/
- http://dilipad.history.ac.uk/2015/08/05/visualizing-parliamentary-discourse-with-word2vec-and-gephi/
- http://fastml.com/numerai-like-kaggle-but-with-a-clean-dataset-top-ten-in-the-money-and-recurring-payouts/
- http://101.datascience.community/2015/12/21/the-most-popular-skills-and-degrees-of-todays-data-scientists/
- http://machinelearningmastery.com/improve-machine-learning-results-with-boosting-bagging-and-blending-ensemble-methods-in-weka/
- http://blog.kaggle.com/2015/12/03/dato-winners-interview-1st-place-mad-professors/
- https://www.kaggle.com/c/dato-native/forums/t/16626/beat-the-benchmark-0-90388-with-simple-model
- http://research.google.com/pubs/pub41159.html
- https://www.kaggle.com/c/avazu-ctr-prediction/forums/t/10927/beat-the-benchmark-with-less-than-1mb-of-memory
- http://alicebot.blogspot.com/
- https://github.com/wojzaremba/algorithm-learning
- http://papers.nips.cc/paper/5849-semi-supervised-convolutional-neural-networks-for-text-categorization-via-region-embedding.pdf
- http://topos-theory.github.io/deep-neural-decision-forests/
- http://byterot.blogspot.co.uk/2015/07/daft-punk-tool-muse-word2vec-model-trained-36K-rock-music-corpus-wiki-NLP-gensim.html
- http://byterot.blogspot.co.uk/2015/06/five-crazy-abstractions-my-deep-learning-word2doc-model-just-did-NLP-gensim.html
- http://devashishshankar.com/2015/07/21/my-journey-in-nlp/
- https://atlas.mindmup.com/2015/06/4cbcef50fa6901327cdf06dfaff79cf0/deep_learning_for_natural_language_proce/index.html
- http://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/
- http://sumsar.net/blog/2015/04/the-non-parametric-bootstrap-as-a-bayesian-model/
- http://planspace.org/20151129-see_sklearn_trees_with_d3/
- http://datascienceplus.com/standard-deviation-vs-standard-error/
- http://opensource.datacratic.com/mtlpy50/
- http://aylien.com/web-summit-2015-tweets-part1
- http://1oclockbuzz.com/2015/11/24/bandit-algorithms-for-bullying-getting-more-lunch-money/
- http://planspace.org/20151129-see_sklearn_trees_with_d3/
- http://dustintran.com/blog/infinite-dimensional-word-embeddings/
- http://www.cis.upenn.edu/~ccb/ppdb/
- https://github.com/caesar0301/awesome-public-datasets
- http://willmcginnis.com/2015/11/29/beyond-one-hot-an-exploration-of-categorical-variables/
- https://www.reddit.com/r/MachineLearning/comments/3upodo/using_keras_lstm_rnn_for_variable_length_sequence/
- http://nbviewer.ipython.org/github/craffel/theano-tutorial/blob/master/Theano%20Tutorial.ipynb
- http://www.picalike.com/blog/2015/01/12/the-portrait-of-a-machine-learning-priestess/
- http://fastml.com/torch-vs-theano/
- http://www.unofficialgoogledatascience.com/2015/11/how-to-get-job-at-google-as-data.html
- http://arxiv.org/pdf/1511.08130v1.pdf
- http://nbviewer.ipython.org/github/rasbt/musicmood/blob/master/code/classify_lyrics/nb_whitelist_model.ipynb
- http://pandas.pydata.org/pandas-docs/version/0.17.1/whatsnew.html
- https://leetcode.com/
- http://nadbordrozd.github.io/interviews/
- http://daeilkim.com/refinery.html
- https://www.simple.com/engineering/building-analytics-at-simple
- http://cs229.stanford.edu/section/cs229-linalg.pdf
- http://arxiv.org/pdf/1511.06388v1.pdf
- http://dustintran.com/blog/trends-and-highlights-of-icml-2015/
- https://groups.google.com/forum/#!topic/gensim/lsvhf7499q4
- http://hduongtrong.github.io/2015/11/20/word2vec/
- http://www.win-vector.com/blog/
- http://dataelixir.com/
- http://pbpython.com/pandas-google-forms-part2.html
- http://www.holehouse.org/mlclass/06_Logistic_Regression.html
- http://www.win-vector.com/blog/2015/01/random-testtrain-split-is-not-always-enough/
- http://simplystatistics.org/2015/03/17/data-science-done-well-looks-easy-and-that-is-a-big-problem-for-data-scientists/
- http://treycausey.com/software_dev_skills.html
- https://github.com/google/skflow
- http://nbviewer.ipython.org/github/justmarkham/gadsdc1/blob/master/logistic_assignment/kevin_logistic_sklearn.ipynb
- http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/index.html
- https://www.reddit.com/r/MachineLearning/comments/3ti0fp/loss_function_must_it_be_convex/
- http://www.kdnuggets.com/2015/11/seven-steps-machine-learning-python.html/2
- http://davmre.github.io/inference/2015/11/13/elbo-in-5min/
- http://blog.datadive.net/selecting-good-features-part-ii-linear-models-and-regularization/
- http://datascienceplus.com/correlation-and-linear-regression/
- https://wit.ai/blog/2015/09/23/emnlp
- http://blog.dominodatalab.com/interactive-dashboards-in-jupyter/
- http://nbviewer.ipython.org/gist/TomAugspurger/a6877233fbbfb51512a9
- https://github.com/dmlc/mxnet/tree/master/amalgamation
- http://www.lab41.org/a-tour-of-sentiment-analysis-techniques-getting-a-baseline-for-sunny-side-up/
- http://googleresearch.blogspot.de/2015/10/improving-youtube-video-thumbnails-with.html
- http://alexperrier.github.io/jekyll/update/2015/09/04/topic-modeling-of-twitter-followers.html
- http://www.meetup.com/DeepLearn-NYC/
- https://peadarcoyle.wordpress.com/2015/11/02/interview-with-a-data-scientist-brad-klingenberg/
- http://www.opendatascience.com/blog/riding-on-large-data-with-scikit-learn/
- http://blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/
- http://computers-are-fast.github.io/
- http://www.bloomberg.com/news/articles/2015-10-26/google-turning-its-lucrative-web-search-over-to-ai-machines
- http://www.insightdatascience.com/blog/academia_to_industry_data_science_myths_and_truths.html
- https://github.com/dmlc/mxnet/blob/master/example/imagenet/predict-with-pretrained-model.ipynb
- https://mxnet.readthedocs.org/en/latest/tutorial/imagenet_full.html
- http://www.datasciencecentral.com/profiles/blogs/why-data-scientists-need-to-be-good-data-storytellers?xg_source=activity
- http://www.datasciencecentral.com/profiles/blogs/zipf-s-distribution-example-of-a-great-application
- http://www.itshared.org/2015/10/data-science-interview-questions.html
- http://www.slideshare.net/dominodatalab/data-science-popup-seattle-understanding-feature-space-in-machine-learning
- https://cs224d.stanford.edu/reports/XingMargaret.pdf
- https://gist.github.com/cigrainger/62910e58db46b7397de2
- http://www.slideshare.net/dominodatalab/data-science-popup-seattle-deep-learning-use-cases?ref=http://www.slideshare.net/dominodatalab/slideshelf
- http://chdoig.github.io/pygotham-topic-modeling/
- http://nlp.stanford.edu/events/illvi2014/papers/sievert-illvi2014.pdf
- https://www.reddit.com/r/MachineLearning/comments/3oa325/how_is_netflix_using_machine_learning_to/
- https://www.reddit.com/r/MachineLearning/comments/3oc3g3/facebookss_ai_system_to_understand_text/
- http://nbviewer.ipython.org/github/tdhopper/notes-on-dirichlet-processes/blob/master/2015-10-07-econtalk-topics.ipynb
- http://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models.pdf
- https://speakerdeck.com/bmabey/visualizing-topic-models
- http://datagenetics.com/blog/august22015/index.html
- http://www.erinshellman.com/bot-or-not/
- http://softwareengineeringdaily.com/2015/10/05/bridging-data-science-and-engineering-with-greg-lamp/
- http://libraryofwords.info/faq.html
- http://libraryofbabel.info/
- https://medium.com/rants-on-machine-learning/what-to-do-with-small-data-d253254d1a89
- http://efavdb.com/pandas-tips-and-tricks/
- http://www.kdnuggets.com/2015/08/gartner-2015-hype-cycle-big-data-is-out-machine-learning-is-in.html
- http://data8.org/text/
- http://blog.someben.com/2013/01/hashing-lang/
- http://www.kdnuggets.com/2014/08/kdd-2014-awards-winners.html
- http://blog.david-andrzejewski.com/machine-learning/practical-machine-learning-tricks-from-the-kdd-2011-best-industry-paper/
- http://www.fi.muni.cz/usr/sojka/posters/rehurek-sojka-scipy2011.pdf
- http://www.slideshare.net/hustwj/cikm-keynotenov2014
- http://deepdist.com/
- http://moviemood.co/basic
- http://datagenetics.com/blog.html
- http://nbviewer.ipython.org/url/norvig.com/ipython/Probability.ipynb
- http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/
- https://blog.rjmetrics.com/2015/09/30/the-ultimate-guide-to-data-science-blogs-150-and-counting/
Cool blogs:
- https://www.cs.bgu.ac.il/~yoavg/uni/bloglike/baboons.html
- http://sujitpal.blogspot.com/2014/10/clustering-section-titles-with.html
- http://tryr.codeschool.com/
- https://github.com/MLWave/Kaggle-Ensemble-Guide
- http://sebastianraschka.com/blog/2015/why-python.html
- http://www.eecs.tufts.edu/~dsculley
- http://blog.echen.me/2011/04/27/choosing-a-machine-learning-classifier/
- https://github.com/zygmuntz?tab=repositories
- http://blog.csdn.net/u010693617/article/details/9148747
- http://www.lucypark.kr/courses/2015-ba/text-mining.html#topic-modeling
- http://karpathy.github.io/
- http://bugra.github.io/
- http://colah.github.io/
- http://linanqiu.github.io/2015/05/20/word2vec-sentiment/
- http://multithreaded.stitchfix.com/blog/2015/03/11/word-is-worth-a-thousand-vectors/
- http://blog.yhathq.com/
- http://www.gregreda.com/
- http://radimrehurek.com/gensim/models/phrases.html
- http://fa.bianp.net/
Visualizations:
- cohort analysis: https://blog.clevertap.com/how-to-use-cohort-analysis-to-improve-retention/
- bokeh 101: http://felipegalvao.com.br/blog/2016/03/15/data-visualization-python-now-with-bokeh/
- 4 story telling strategies: http://annkemery.com/four-storytelling-strategies/
- http://cs.stanford.edu/people/karpathy/svmjs/demo/
- https://www.oreilly.com/ideas/jupyter-at-oreilly
Writing:
- https://www.dataquest.io/blog/python-data-visualization-libraries/
- https://districtdatalabs.silvrback.com/markup-for-fast-data-science-publication
Teaching:
- http://www.cs.jhu.edu/~delip/
- 12 weeks $14K course: http://www.thisismetis.com/data-science
- https://courses.cs.washington.edu/courses/cse546/12wi/slides/cse546wi12intro.pdf
- https://github.com/jreback/PyDataNYC2015/tree/master/tutorial
- http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch08/ch08.ipynb
- https://github.com/Dyakonov/notebooks/blob/master/dj_pandas_tutoral.ipynb
- https://github.com/amueller/scipy_2015_sklearn_tutorial/tree/master/notebooks
- http://neuralnetworksanddeeplearning.com/chap6.html
- http://neuralnetworksanddeeplearning.com/chap1.html
- https://www.tensorflow.org/versions/master/tutorials/image_recognition/index.html
- https://github.com/ryankiros/skip-thoughts
- http://www.tensorflow.org/tutorials/seq2seq/index.html
- https://github.com/Newmu/Theano-Tutorials
- https://github.com/nlintz/TensorFlow-Tutorials
- https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/
- http://alexperrier.github.io/jekyll/update/2015/09/04/topic-modeling-of-twitter-followers.html
- https://github.com/hangtwenty/dive-into-machine-learning
- https://sites.google.com/site/deeplearningsummerschool/schedule
- https://github.com/mila-udem/summerschool2015
- https://github.com/ChristosChristofidis/awesome-deep-learning
- http://goodfeli.github.io/dlbook/
- http://cs231n.stanford.edu/
- http://www-cs.stanford.edu/~quocle/tutorial1.pdf
- http://www-cs.stanford.edu/~quocle/tutorial2.pdf
- https://svn.spraakdata.gu.se/repos/richard/pub/statnlp2015_web/index.html
- https://svn.spraakdata.gu.se/repos/richard/pub/statnlp2015_web/l9.pdf
- http://people.duke.edu/~ccc14/sta-663/Jupyter.html
- https://github.com/amitkaps/weed
- https://github.com/rouseguy/intro2stats/tree/master/notebooks
- https://github.com/chdoig/pytexas2015-ml
- https://github.com/tdhopper/notes-on-dirichlet-processes
- https://books.google.com.vn/books?id=qjwqBAAAQBAJ&pg=PA197&lpg=PA197&dq=duplicate+detection+gensim&source=bl&ots=wujuprMMWO&sig=e_uj4RLQSuJZWZUiCA8Z8CuhcY0&hl=en&sa=X&redir_esc=y#v=onepage&q=duplicate%20detection%20gensim&f=false