Pinned Repositories
awesome-bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
bud
Prototype Bud runtime (Bloom Under Development)
elephant-bird
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, and HBase code.
elephant-twin
Elephant Twin is a framework for creating indexes in Hadoop
elephant-twin-lzo
Elephant Twin LZO uses Elephant Twin to create LZO block indexes
flume
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.
idl_storage_guidelines
This document attempts to capture useful patterns and warn about subtle gotchas when it comes to designing and evolving schemas for long-term serialized data. It is not intended as a guide for how to best represent a particular dataset or process.
pig
Mirror of Apache Pig
piglatin-mode
PigLatin mode for Emacs.
Vertica-Hadoop-Connector
Vertica Hadoop Connector
dvryaboy's Repositories
dvryaboy/pig
Mirror of Apache Pig
dvryaboy/idl_storage_guidelines
This document attempts to capture useful patterns and warn about subtle gotchas when it comes to designing and evolving schemas for long-term serialized data. It is not intended as a guide for how to best represent a particular dataset or process.
dvryaboy/elephant-bird
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, and HBase code.
dvryaboy/piglatin-mode
PigLatin mode for Emacs.
dvryaboy/elephant-twin
Elephant Twin is a framework for creating indexes in Hadoop
dvryaboy/elephant-twin-lzo
Elephant Twin LZO uses Elephant Twin to create LZO block indexes
dvryaboy/Vertica-Hadoop-Connector
Vertica Hadoop Connector
dvryaboy/awesome-bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
dvryaboy/bud
Prototype Bud runtime (Bloom Under Development)
dvryaboy/flume
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.
dvryaboy/giraph
Mirror of Apache Giraph
dvryaboy/hadoop-lzo
Patched, refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20
dvryaboy/PigEditor
Eclipse plugin for Apache Pig
dvryaboy/scribe
Scribe is a server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine.
dvryaboy/apache-proposal
Apache Incubator Proposal for Parquet Format
dvryaboy/cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster.
dvryaboy/gitbook
The GitBook documentation for Aqueduct
dvryaboy/Impatient
source examples to support the "Cascading for the Impatient" blog post series
dvryaboy/incubator-parquet-format
Mirror of Apache Parquet
dvryaboy/incubator-parquet-mr
Mirror of Apache Parquet
dvryaboy/lakeFS
lakeFS - Data version control for your data lake | Git for data
dvryaboy/lilcody
A tiny version of Cody from Sourcegraph, created for study and experimentation
dvryaboy/MassQueryLanguage
The Mass Spec Query Language (MassQL) is a domain specific language meant to be a succinct way to express a query in a mass spectrometry centric fashion.
dvryaboy/parquet-format-1
As we are moving to Apache, please open your pull requests on: https://github.com/apache/incubator-parquet-format
dvryaboy/pdi-google-spreadsheet-plugin
Plugin for Pentaho Data Integration allowing reading and writing of Google Spreadsheets
dvryaboy/redelm
an anagram
dvryaboy/scalding
A Scala API for Cascading
dvryaboy/semantic-versioning
Java library relying on semver.org principles to check binary code compatibility