/trinamo

Trinamo generates HiveQL using YAML to mount tables of DynamoDB, S3 and local HDFS.

Primary LanguageRubyMIT LicenseMIT

Trinamo

Build Status Coverage Status

Trinamo generates HiveQL using YAML to mount tables of DynamoDB, S3 and local HDFS.

Installation

Add this line to your application's Gemfile:

gem 'trinamo'

And then execute:

$ bundle

Or install it yourself as:

$ gem install trinamo

Usage

Table Definition

Generate a template for DDL

  • RUN:
Trinamo::Converter.generate_ddl_template(out_file_path = 'ddl.yml')
  • OUTPUT:
tables:
  - name: comments
    s3_location: s3://path/to/s3/table/location
    s3_partition:
      - name: date
        type: string
    hash_key:
      - name: user_id
        type: bigint
    range_key:
      - name: comment_id
        type: bigint
    attributes:
      - name: title
        type: string
      - name: content
        type: string
      - name: rate
        type: double
  - name: authors
    hash_key:
      - name: author_id
        type: bigint
    attributes:
      - name: name
        type: string

Generate a template for hive options

  • RUN:
Trinamo::Converter.generate_options_template(out_file_path = 'ddl.yml')
  • OUTPUT:
options:
  dynamodb.throughput.read.percent: 0.5
  hive.exec.compress.output: true
  io.seqfile.compression.type: BLOCK
  mapred.output.compression.codec: com.hadoop.compression.lzo.LzoCodec

Then, modify table-definitions and hive-settings as you like.

Create DDLs in HiveQL

For Options

  • RUN:
Trinamo::Converter.load('ddl.yml').convert(:option)
  • OUTPUT:
SET dynamodb.throughput.read.percent = 0.5;
SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK;
SET mapred.output.compression.codec = com.hadoop.compression.lzo.LzoCodec;

For DynamoDB

  • RUN:
Trinamo::Converter.load('ddl.yml').convert(:dynamodb)
  • OUTPUT:
-- comments_ddb
CREATE EXTERNAL TABLE comments_ddb (
  user_id BIGINT,comment_id BIGINT,title STRING,content STRING,rate DOUBLE
)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES (
  'dynamodb.table.name' = 'comments',
  'dynamodb.column.mapping' = 'user_id:user_id,comment_id:comment_id,title:title,content:content,rate:rate'
);

-- authors_ddb
CREATE EXTERNAL TABLE authors_ddb (
  author_id BIGINT,name STRING
)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES (
  'dynamodb.table.name' = 'authors',
  'dynamodb.column.mapping' = 'author_id:author_id,name:name'
);

For S3

  • RUN:
Trinamo::Converter.load('ddl.yml').convert(:s3)
  • OUTPUT:
-- comments_s3
CREATE EXTERNAL TABLE comments_s3 (
  user_id BIGINT,comment_id BIGINT,title STRING,content STRING,rate DOUBLE
) PARTITIONED BY (date STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
LOCATION 's3://path/to/s3/table/location';

For HDFS

  • RUN:
Trinamo::Converter.load('ddl.yml').convert(:hdfs)
  • OUTPUT:
-- comments_hdfs
CREATE TABLE comments_hdfs (
  user_id BIGINT,comment_id BIGINT,title STRING,content STRING,rate DOUBLE
);

-- authors_hdfs
CREATE TABLE authors_hdfs (
  author_id BIGINT,name STRING
);

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/cignoir/trinamo.

License

The gem is available as open source under the terms of the MIT License.