smh2019's Stars
facebookresearch/ParlAI
A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
donnemartin/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
fivethirtyeight/russian-troll-tweets
gyglim/dvn
Reference implementation for Structured Prediction with Deep Value Networks
smh2019/probcomp-stack
MIT Probabilistic Computing Project software stack
Serene-Arc/bulk-downloader-for-reddit
Downloads and archives content from reddit
iamtrask/Grokking-Deep-Learning
this repository accompanies the book "Grokking Deep Learning"
danielecook/Awesome-Bioinformatics
A curated list of awesome Bioinformatics libraries and software.
stepthom/text_mining_resources
Resources for learning about Text Mining and Natural Language Processing
ethen8181/machine-learning
:earth_americas: machine learning tutorials (mainly in Python3)
jayinai/data-science-question-answer
A repo for data science related questions and answers
donnemartin/interactive-coding-challenges
120+ interactive Python coding interview challenges (algorithms and data structures). Includes Anki flashcards.
datopian/bad-data
Examples of bad data, especially from government.
csvsoundsystem/federal-treasury-api
The scraper, parser, and database creation scripts for Financial Management Service daily U.S. Treasury statements.
jwasham/coding-interview-university
A complete computer science study plan to become a software engineer.
pushshift/api
Pushshift API
pk026/cuba
There is a continuous stream of user activity events generated from multiple users as they use our mobile Cube app. Objective is to implement a server to ingest these events. The server will expose a http end-point to which the events would be posted. Also the server will contain an admin interface to specify business rules, that alert the operator (an engineer in the Cube Ops team) or trigger an action (like sending an alert sms to the end user), when certain criteria is met.
nio-blocks/reddit
Polls the Reddit API for the specified subreddit
kelseyhightower/kubernetes-the-hard-way
Bootstrap Kubernetes the hard way. No scripts.
ks-avinash/aws-lambda-function
Simple code for extracting data from excel sheet and Ingest into AWS S3 bucket
ytian22/Bike-Share-Demand-Prediction
Predicted Bay Area bike share demand with Spark MLlib and built a pipeline to bridge Amazon S3, MongoDB server, and Spark EC2 cluster for NoSQL data processing.
prakhar1989/docker-curriculum
:dolphin: A comprehensive tutorial on getting started with Docker!
donnemartin/data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
amararyal/Co-Tags
ekhtiar/swiss-transport-datapipeline
A data pipeline to daily pull public transport data from the opentransportdata.swiss portal. This pipeline has three tasks, pull the right data from opentransportdata.swiss, push the data to s3 for storage, and transform and load the transformed data to a database. Hopefully this repository helps people explain ETL / Batch data pipeline.
royhobbstn/s3-db
A serverless data processing pipeline to store Census data in AWS S3.
associatedpress/national-caseload-data-ingest
Scripts to download the U.S. Department of Justice's National Caseload Data and load it into Amazon Athena for querying
damienmarlier51/Kinesis_Lambda_DynamoDB
Data ingestion on AWS
aws-samples/amazon-elasticsearch-lambda-samples
Data ingestion for Amazon Elasticsearch Service from S3 and Amazon Kinesis, using AWS Lambda: Sample code
BracketJohn/is-this-an-mlm
Website to tell visitors whether a Company is an MLM