dvryaboy

Pinned Repositories

awesome-bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
1 1 00
bud
Prototype Bud runtime (Bloom Under Development)
Language:Ruby1 1 00
elephant-bird
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, and HBase code.
Language:Java5 1 02
elephant-twin
Elephant Twin is a framework for creating indexes in Hadoop
Language:Java2 1 00
elephant-twin-lzo
Elephant Twin LZO uses Elephant Twin to create LZO block indexes
Language:Java2 1 00
flume
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.
Language:Java1 2 01
idl_storage_guidelines
This document attempts to capture useful patterns and warn about subtle gotchas when it comes to designing and evolving schemas for long-term serialized data. It is not intended as a guide for how to best represent a particular dataset or process.
13 0 10
pig
Mirror of Apache Pig
Language:Java18 8 08
piglatin-mode
PigLatin mode for Emacs.
Language:Emacs Lisp5 4 19
Vertica-Hadoop-Connector
Vertica Hadoop Connector
Language:Java2 1 00

dvryaboy's Repositories

dvryaboy/pig
Mirror of Apache Pig
Language:Java18 8 08
dvryaboy/idl_storage_guidelines
This document attempts to capture useful patterns and warn about subtle gotchas when it comes to designing and evolving schemas for long-term serialized data. It is not intended as a guide for how to best represent a particular dataset or process.
13 0 10
dvryaboy/elephant-bird
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, and HBase code.
Language:Java5 1 02
dvryaboy/piglatin-mode
PigLatin mode for Emacs.
Language:Emacs Lisp5 4 19
dvryaboy/elephant-twin
Elephant Twin is a framework for creating indexes in Hadoop
Language:Java2 1 00
dvryaboy/elephant-twin-lzo
Elephant Twin LZO uses Elephant Twin to create LZO block indexes
Language:Java2 1 00
dvryaboy/Vertica-Hadoop-Connector
Vertica Hadoop Connector
Language:Java2 1 00
dvryaboy/awesome-bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
1 1 00
dvryaboy/bud
Prototype Bud runtime (Bloom Under Development)
Language:Ruby1 1 00
dvryaboy/flume
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.
Language:Java1 2 01
dvryaboy/giraph
Mirror of Apache Giraph
Language:Java1 1 00
dvryaboy/hadoop-lzo
Patched, refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20
Language:Shell1 1 0
dvryaboy/PigEditor
Eclipse plugin for Apache Pig
Language:Java1 2 0
dvryaboy/scribe
Scribe is a server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine.
Language:C++1 1 01
dvryaboy/apache-proposal
Apache Incubator Proposal for Parquet Format
1 01
dvryaboy/cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster.
Language:Java1 0
dvryaboy/gitbook
The GitBook documentation for Aqueduct
0 0
dvryaboy/Impatient
source examples to support the "Cascading for the Impatient" blog post series
Language:Java1 0
dvryaboy/incubator-parquet-format
Mirror of Apache Parquet
Language:Java1 0
dvryaboy/incubator-parquet-mr
Mirror of Apache Parquet
Language:Java2 0
dvryaboy/lakeFS
lakeFS - Data version control for your data lake | Git for data
Language:Go0 0
dvryaboy/lilcody
A tiny version of Cody from Sourcegraph, created for study and experimentation
dvryaboy/MassQueryLanguage
The Mass Spec Query Language (MassQL) is a domain specific language meant to be a succinct way to express a query in a mass spectrometry centric fashion.
Language:Python0 0
dvryaboy/parquet-format-1
As we are moving to Apache, please open your pull requests on: https://github.com/apache/incubator-parquet-format
Language:Java2 0
dvryaboy/pdi-google-spreadsheet-plugin
Plugin for Pentaho Data Integration allowing reading and writing of Google Spreadsheets
Language:Java
dvryaboy/redelm
an anagram
Language:Java1 0
dvryaboy/scalding
A Scala API for Cascading
Language:Scala1 0
dvryaboy/semantic-versioning
Java library relying on semver.org principles to check binary code compatibility
Language:Java1 0