/urleng

Distributed URL Engine for ClickHouse (public service)

Primary LanguageHTML

visitor badge

Cloudflare + Deta Worker serving a decentralized, stateful, serverless URL Engine platform for ClickHouse.

About

The serverless API enables separate ClickHouse servers to access external distributed tables hosted on the cloud and share any type of data between nodes and 3rd party applications in real-time through the native URL Engine and Functions available in ClickHouse.

image

Serverless Platforms

The distributed URL engine platform is running on Cloudflare Workers and utilizing Deta Cloud as backend.

  • Cloudflare Worker NDJSON Proxy (limited to 100k req/day) *
  • Cloudflare Worker API Handler (limited to 100k req/day) *
  • Deta Micros + Base Storage (unlimited requests, max 10k/inserts/query) *

Public Access Point

  • https://urleng.com/{yoursupersecretkeygoeshere}

Usage

The following examples illustrates usage for distributed INSERT and SELECT statements

INSERT
:) INSERT INTO FUNCTION url('https://urleng.com/supersecret', JSONEachRow, 'key String, value UInt64') VALUES ('hello', 1), ('world', 2)
SELECT
:) SELECT * FROM url('https://urleng.com/supersecret', JSONEachRow);

┌─key───┬─value─┐
│ hello │     1 │
│ world │     2 │
└───────┴───────┘

2 rows in set. Elapsed: 0.126 sec. 
URL ENGINE
:) CREATE TABLE default.url_engine_distributed
   (
       `key` String,
       `value` UInt64,
   )
   ENGINE = URL('https://urleng.com/supersecret', 'JSONEachRow')
:) INSERT INTO url_engine_distributed VALUES ('hello', 1), ('world', 2)
:) SELECT * FROM url_engine_distributed

┌─key───┬─value─┐
│ hello │     1 │
│ world │     2 │
└───────┴───────┘

2 rows in set. Elapsed: 0.185 sec. 
Expiration

Items can be set to expire by including an __expires key in the object carrying a future Unix timestamp:

:) CREATE TABLE default.url_engine_expire
   (
       `key` String,
       `value` UInt64,
       `__expires` UInt64 DEFAULT toUInt64(toUnixTimestamp(now() + interval 24 hour))
   )
   ENGINE = URL('https://urleng.com/supersecret', 'JSONEachRow')
clickhouse-local

Get data into clickhouse-local with zero efforts:

clickhouse-local -q "select count() from url('https://urleng.com/supersecret', JSONEachRow)"
chdb

Get data using chdb in-memory engine:

python -m chdb "SELECT * from url('https://urleng.com/xxx', JSONEachRow, 'key String, value UInt64');" Pretty
CURL

Insert and query data using curl or any other HTTP/S GET/POST capable client.

POST ndjson
curl -s -XPOST https://urleng.com/supersecret \
     -H 'Content-Type:application/x-ndjson' --data-binary @ndjson.txt
POST json
curl -X POST https://url-engine.metrico.in/supersecret \
     -H 'Content-Type: application/json' -d '[{"key":"curl","value":1}]'
GET
curl -X GET https://urleng.com/supersecret

Notes

  • INSERTS are updates. Existing data will be replaced. Runtime limit ~1000 records per INSERT.
  • SELECT pulls the full table, pending X-Header extensions to prefilter WHERE statements.
  • Because there is no sign-up, the URL /path is essentially a password, so pick something unique.

Examples

(C) 2022 QXIP BV, for more info visit the qxip metrico repository