
distributed keyvalue database imitating dynamodb querying with SQL support, distributed joins and Cypher graph support

Primary LanguagePython


This is an experimental project.

  • You can create a basic database with a hash table and a prefix trie. See hash-db.py It's a very small database.
  • This reflects dynamodb style querying.
  • You can create a basic distributed database with consistent hashing. See client.py and server.py. Data is rebalanced onto nodes as new servers are added.
  • SQL is parsed on the server and work distributed to data nodes. Please see this blog post. For how distributed join works: see this blog post.
  • Cypher is parsed on the server and distributed to a data node for processing. Graphs only live on one data node at a time. I haven't worked out how to distribute their processing yet.

This project demonstrates how simple a database can be. Do not use for serious data, it's only stored in memory and there is no persistence.

See this stackoverflow question

Also see the Java version here

This project uses Google's pygtrie and Michaeln Nielsen's consistent hashing code


Run pip install -r requirements.txt

Run ./start-all.sh to start server with 3 data nodes. See example.py for the tests that I run as part of development.

Distributed joins

Data is distributed across the cluster, with rows being on one server each. I haven't gotten around to load balancing the data.

First, register a join with the server:

create join
    inner join people on people.id = items.people
    inner join products on items.search = products.name

print("create join sql")
statement = """create join
    inner join people on people.id = items.people
    inner join products on items.search = products.name
url = "http://{}/sql".format(args.server)
response = requests.post(url, data=json.dumps({
    "sql": statement

Insert data. The join is maintained as you insert data. Data is spread out across the cluster.

In join on one server and then ask for missing data from the other data nodes.

curl -H"Content-type: application/json" -X POST http://localhost:1005/sql --data-ascii '{"sql": "select products.price, people.people_name, items.search from items inner join people on items.people = people.id inner join products on items.search = products.name"}'

API standard

Sort key begins with value


Pk begins and Sk begins


Sort key between hash_values


Partition key and sort key between values


SQL Interface

curl -H"Content-type: application/json" -X POST http://localhost:1005/sql --data-ascii '{"sql": "select * from people"}'
print("4 insert sql")                                                                                                  |            elif last_id == identifier:                                                                               
url = "http://{}/sql".format(args.server)                                                                              |                table_metadata["current_record"][field_name] = data[lookup_key]                                       
response = requests.post(url, data=json.dumps({                                                                        |                                                                                                                      
    "sql": "insert into people (people_name, age) values ('Sam', 29)"                                                  |                                                                                                                      
    }))                                                                                                                |        field_reductions = []                                                                                         
print(url)                                                                                                             |        for index, pair in enumerate(table_datas):                                                                    

Cypher interface

For simplicity, we only support Cypher triples. That is, (node)-[:relationship]-(node) separated by commas. But the sum of the triples can produce the same output as if the Cypher was all in one line.

curl -H"Content-type: application/json" -X POST http://localhost:1005/cypher --data-ascii '{"key": "1", "cypher": "match (start:Person)-[:FRIEND]->(end:Person), (start)-[:LIKES]->(post:Post), (end)-[:POSTED]->(post) return start, end, post"}'
query = """match (start:Person)-[:FRIEND]->(end:Person), (start)-[:LIKES]->(post:Post), (end)-[:POSTED]->(post) return start, end, post"""
url = "http://{}/cypher".format(args.server)
response = requests.post(url, data=json.dumps({
    "key": "1",
    "cypher": query