Data ingestion scripts and docker container for Terminal GraphQL + Postgres backend.
- install dependencies with
yarn
- add a
.env
file to the root of the repo with env variables structured as such (depending on your local postgres setup):
TOKEN_DATABASE=token-info
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
- run local hasura server with
./docker-run.sh
Included are 5 scripts:
- ingestTokens
- run with
node ingestTokens.js /path/to/tokens.csv
- run with
- ingestTransfers
- run with
node ingestTransfers.js /path/to/token-transfers.csv
- run with
- ingestAll
- run with
node ingestAll.js
- adjust constants in
ingestAll.js
to suit needs
- run with
- emptyDB
- run with
node emptyDB.js
- utility function to clear both
tokens
andtoken-transfers
tables
- run with
- benchmark
- run with
node benchmark.js
- run with
A query to return the 50 most recent token transfers, with the token symbol & name as well as the time and the amount transferred:
query {
token_transfers(limit: 50, order_by: {block_timestamp: desc}) {
token {
name
symbol
}
block_timestamp
value
}
}
Benchmark: 10 requests in 6.844s (local server)
A query to return the sender, recipient and amount sent on every transfer of the RALLY
token (note: token names/symbols are not unique. If you want a particular token, you'll have to specify the address)
query {
tokens(where: {symbol: {_eq: "RALLY"}}) {
transfers {
from_address
to_address
value
}
}
}
Benchmark: 10 requests in 1.221s (local server)
I used Hasura for the GraphQL server because of its lightweight footprint. Unlike alternatives like Prisma, Hasura is a compiler that sits infront of a Postgres database. It compiles GraphQL queries into SQL queries. This makes Hasura very fast and capable of handling complex queries all while using very little memory and less CPU power than alternatives. Hasura is also easy to scale both vertically and horizontally.
I used javascript/node.js to write the ingestion scripts because of its ease of use and widespread adoption among web developers.
A note about the scripts: we insert the records into the database in batches of 1000 to avoid the overhead of each commit. These inserts could be run sequentially if depending on resource constraints.
- Local
- Insert ~144,000 token records: 2.99s
- Insert ~290,000 transfer records: 10.057s
- Insert 5000 token records: 0.116s
- Insert 5000 transfer records: 0.251s
- Heroku
- Insert 5000 token records: 1.378s
- Insert 5000 transfer records: 2.211s
I've deployed a Postgres instance as well as a Hasura server on Heroku, which you can see and interact with Here
The free tier allows only 10000 entires in the Postgres DB, so the queries will be by no means fully functional. However this deployment could be easily scaled up to accomodate more data.
Offering this as a production application I would use the same stack (Postgres/Hasura deployed on Heroku). However, depending how the data comes in in production, the ingestion scripts could be automated. They could be set to run on GCS everytime an object is added to one of the relevant buckets using GCS Triggers. The CSV data can be streamed directly from GCS to the Postgres DB (I was planning on doing that here, but was having issues with the permissions so opted to upload from local CSVs). Updates to the tables/relationships can be done directly from the Hasura console.