Pinned Repositories
ds-for-telco
Source material for Data Science for Telecom Tutorial at Strata Singapore 2015
exhibitor
Simple web application for displaying data for entities with supernova schemas.
fantasy-football
Choosing a fantasy football team using spark, hive, python, and really just about anything.
legal-lsa
Latent Semantic Analysis of Legal Documents
py-hadoop-tutorial
Source Material for using Python and Hadoop together
scala-dataflow-dsl
A scala dsl for dataflow
spark-dataflow
sparklingpandas-ex
Examples of using SparklingPandas and Pandas with PySpark
svd-benchmark
A repo for benchmarking distributed implementations of the singular value decomposition.
ttitd-traffic
That thing in the desert has traffic. What is it like?
hougs's Repositories
hougs/ds-for-telco
Source material for Data Science for Telecom Tutorial at Strata Singapore 2015
hougs/sparklingpandas-ex
Examples of using SparklingPandas and Pandas with PySpark
hougs/py-hadoop-tutorial
Source Material for using Python and Hadoop together
hougs/ttitd-traffic
That thing in the desert has traffic. What is it like?
hougs/legal-lsa
Latent Semantic Analysis of Legal Documents
hougs/svd-benchmark
A repo for benchmarking distributed implementations of the singular value decomposition.
hougs/AoC-2021
Advent of Code 2021
hougs/mllib-utils
Some wrapper utilities for working with Spark MLLib.
hougs/sk-score-ex
Example of applying a fit sklearn model to a distributed dataset using pyspark.
hougs/sparklingpandas
Pandas On PySpark(POPS)
hougs/whattreeisthis
What tree is this? A progressive web app that teaches users how to identify trees.
hougs/311-anomaly-detection
hougs/aas
Code to accompany Advanced Analytics with Spark from O'Reilly Media
hougs/bnb-blog
My Hugo Blog + Site
hougs/compare-a-frame
Serde Comparisons for Pandas DataFrames
hougs/gen-lin-models
An IPython notebook explaining generalized linear models, particuarly for count data.
hougs/gghazard
Improved Base and Grid Plots for Survival Hazard Cox Regression
hougs/ibis
Big data the Pythonic way. Productivity-centric Python data analysis framework for Analytic SQL and Hadoop, with high performance extensions for Impala. Co-founded by the creator of pandas
hougs/Impala
Real-time Query for Hadoop
hougs/jhlch.github.io
A place to write.
hougs/okika
Mobile app for plant identification of plants you may find in Hawaii
hougs/oryx
Oryx 2 (incubating): Lambda architecture on Spark for real-time large scale machine learning
hougs/parental-leave
hougs/parksconserverancy
Data project for Conservancy Vegetation Monitoring Data
hougs/parquet-mr
Mirror of Apache Parquet
hougs/py-env-parcel
Scripts for building CDH parcels to distribute python enviroments.
hougs/ranger-survey
Black Rock City Ranger Survey Analysis
hougs/robustpca
hougs/spark
Mirror of Apache Spark
hougs/zika-hackathon
Data Science Hackathon with UT Austin | Mosquito Transmitted Viruses