Small repository to test capabilities of Apache Hive 4 and Apache Iceberg. Here you will find a small walkthrough to configure and use Hive to study Iceberg with HiveQL
Before, you must ensure that Docker and Dbeaver are installed. After that, you just need to run docker-compose.yml.
Docker may be find on official website.
Dbeaver is an awesome tool to connect and run SQL Queries on Hive and other databases. Download it here.
Download data from Kaggle.
You can download either HI-Large_Trans.csv (~150m lines) or LI-Small_Trans.csv (~6m lines), both has the same schema but, with different row count.
Create folders .\warehouse\landing\laundering\
and move data to there.
After docker installed, run the following code that will up some containers with hive, hive metastore and a postgres (metastore db).
docker-compose up -d
If everythings works fine:
To connect dbeaver with hive, you just to configure port and ip, no needs any authentication.
CREATE DATABASE db_landing;
In order to create an external table to data use: (or ./scrips/landing.sql)
CREATE EXTERNAL TABLE IF NOT EXISTS db_landing.tbl_laundering (
`dt_event` STRING COMMENT 'Timestamp',
`from_bank` STRING COMMENT 'From Bank',
`account1` STRING COMMENT 'Account',
`to_bank` STRING COMMENT 'To Bank',
`account2` STRING COMMENT 'Account2',
`amount_received` STRING COMMENT 'Amount Received',
`receiving_currency` STRING COMMENT 'Receiving Currency',
`amount_paid` STRING COMMENT 'Amount Paid',
`payment_currency` STRING COMMENT 'Payment Currency',
`payment_format` STRING COMMENT 'Payment Format',
`is_laundering` STRING COMMENT 'Is Laundering'
)
COMMENT 'Landing Laundering Table'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/opt/hive/data/warehouse/landing/laundering/'
TBLPROPERTIES ("skip.header.line.count"="1");
SELECT * FROM db_landing.tbl_laundering LIMIT 1000;