Pinned Repositories
aaronbinns.github.io
bacon
Experimenting with Apache Pig.
db-deploy
Scripts and stuff to make Databricks deployments easier for MMC customers
db-repo-path
Demonstration of approach to access Python modules on workers in Git repo
db-test
Testing rando stuff for Databricks
jbs
Builds Lucene/Solr indexes out of NutchWAX segments and revisit records via Hadoop.
slarpy
(s)o(l)r+(ar)c+(py)thon
tnh
(T)he (N)ew (H)otness. Improved full-txt search of archival web data.
waimea
Full-text indexing pipeline of Pig scripts.
aaronbinns's Repositories
aaronbinns/bacon
Experimenting with Apache Pig.
aaronbinns/jbs
Builds Lucene/Solr indexes out of NutchWAX segments and revisit records via Hadoop.
aaronbinns/tnh
(T)he (N)ew (H)otness. Improved full-txt search of archival web data.
aaronbinns/slarpy
(s)o(l)r+(ar)c+(py)thon
aaronbinns/db-deploy
Scripts and stuff to make Databricks deployments easier for MMC customers
aaronbinns/waimea
Full-text indexing pipeline of Pig scripts.
aaronbinns/aaronbinns.github.io
aaronbinns/db-repo-path
Demonstration of approach to access Python modules on workers in Git repo
aaronbinns/db-test
Testing rando stuff for Databricks
aaronbinns/elasticsearch-dump
Import and export tools for elasticsearch
aaronbinns/heritrix3
Local hacks and patches to IA Heritrix3
aaronbinns/ia-hadoop-tools
Clone of iof ia-hadoop-tools repo, but just zipnum branch with new features for zipnum and cluster merging.
aaronbinns/opennlp
Mirror of Apache OpenNLP (Incubating)
aaronbinns/pegasus
VM based deployment for prototyping Big Data tools on Amazon Web Services
aaronbinns/topo-json
aaronbinns/wayback
Fork of IA wayback with some local patches/hacks.