An Apache Hive SerDe (short for serializer/deserializer) for the Ion file format, with support for Hive 2 and Hive 3.
- Read data stored in Ion format both binary and text.
- Supports all Ion types including nested data structures, see Type mapping documentation for more information.
- Supports flattening of Ion documents through path extraction.
- Supports importing shared symbol tables and custom symbol table catalogs.
IonInputFormat
andIonOutputFormat
are able to handle both Ion binary and Ion text.- Configurable through SerDe properties.
Download the latest ion-hive(2|3)-serde-all-<version-number>.jar
from [https://github.com/amazon-ion/ion-hive-serde/releases]. (Some releases may have a slightly different jar name, such as ion-hive(2|3)-serde-<version-number>-all.jar
.)
To build it locally run :./gradlew shadowJar
Project is separated into modules:
hive2
: with the SerDe code and unit tests for Hive 2.hive3
: with the SerDe code and unit tests for Hive 3.integration-tests
: integration tests using a dockerized hive installation.
To build only the SerDe code:
./gradlew :hive2:build :hive3:build
To build the SerDe including integration tests:
./gradlew build
Integration tests require docker to be installed, but the build itself will take care of creating the necessary containers, starting and stopping them. See integration-tests/README.md for more information, including how to run the integration tests on your IDE.
Examples shown using Ion text for readability but for better performance and compression Ion binary is recommended in production systems.
~$ cat test.ion
{
name: "foo",
age: 32
}
{
name: "bar",
age: 28
}
$ hadoop fs -put -f test.ion /user/data/test.ion
$ hive
hive> CREATE DATABASE test;
hive> CREATE EXTERNAL TABLE test (
name STRING,
age INT
)
ROW FORMAT SERDE 'com.amazon.ionhiveserde.IonHiveSerDe'
STORED AS
INPUTFORMAT 'com.amazon.ionhiveserde.formats.IonInputFormat'
OUTPUTFORMAT 'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION '/user/data';
hive> SELECT * FROM test;
OK
foo 32
bar 28
~$ cat test.ion
{
personal_info: { name: "foo", age: 32 }
professional_info: { job_title: "software engineer" }
}
{
personal_info: { name: "bar", age: 28 }
professional_info: { job_title: "designer" }
}
$ hadoop fs -put -f test.ion /user/data/test.ion
$ hive
hive> CREATE DATABASE test;
hive> CREATE EXTERNAL TABLE test (
name STRING,
age INT,
jobtitle STRING
)
ROW FORMAT SERDE 'com.amazon.ionhiveserde.IonHiveSerDe'
WITH SERDEPROPERTIES (
"ion.name.path_extractor" = "(personal_info name)",
"ion.age.path_extractor" = "(personal_info age)",
"ion.jobtitle.path_extractor" = "(professional_info job_title)",
)
STORED AS
INPUTFORMAT 'com.amazon.ionhiveserde.formats.IonInputFormat'
OUTPUTFORMAT 'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION '/user/data';
hive> SELECT * FROM test;
OK
foo 32 software engineer
bar 28 designer
See CONTRIBUTING
This library is licensed under the Apache 2.0 License.