/async-tinydb

Yet Another Async Version of TinyDB

Primary LanguagePythonMIT LicenseMIT

logo

What's This?

An asynchronous version of TinyDB with extended capabilities.

Almost every method is asynchronous. And it's based on TinyDB 4.7.0+.

Unlike TinyDB which has a minimal core, Async-TinyDB is designed to have max flexibility and performance.

Incompatible Changes

  • Asynchronous: Say goodbye to blocking IO. Don't forget to await async methods!

  • Drop support: Only supports Python 3.10+.

  • ujson: Using ujson instead of json. Some arguments aren't compatible with json1

  • Dev-Changes: Changes that only matter to developers (Who customise Storage, Query, etc).

  • Miscellaneous: Differences that only matter in edge cases.

New Features

  • Event Hooks: You can now use event hooks to hook into an operation. See Event Hooks for more details.

  • Redesigned ID & Doc Class: You can replace and customise them easily.

  • DB-level Caching: This significantly improves the performance of all operations. However, it may cause dirty reads with some types of storage 2.

  • Built-in Modifier: Use Modifier to easily compress, encrypt and extend types of your database. Sure you can do much more than these. (See Modifier)

  • Isolation Level: Performance or ACID? It's up to you3.

  • Atomic Write: Shipped with JSONStorage

  • Batch Search By IDs: search method now takes an extra doc_ids argument (works like an additional condition)

How to use it?

Installation

  • Minimum: pip install async-tinydb
  • Encryption: pip install async-tinydb[encryption]
  • Compression: pip install async-tinydb[compression]
  • Full: pip install async-tinydb[all]

Importing

import asynctinydb

Usage

Read the original TinyDB documents. Insert an await in front of async methods.

Notice that some codes are still blocking, for example, when calling len() on TinyDB or Table Objects.

That's it.


Documents For Advanced Usage

Replacing ID & Document Class

NOTICE: Mixing classes in one table may cause errors!

When a table exists in a file, Async-TinyDB won't determine its classes by itself, it is your duty to make sure classes are matching.

ID Classes

  • IncreID: Default ID class, mimics the behaviours of the original int ID but requires much fewer IO operations.
  • UUID: Uses uuid.UUID4.

Document Class

  • Document: Default document class, uses dictunder the bonet.
from asynctinydb import TinyDB, UUID, IncreID, Document

db = TinyDB("database.db")

# Setting ID class to `UUID`, document class to `Document`
tab = db.table("table1", document_id_class=UUID, document_class=Document)

See Customisation for more details

Encryption

Currently only supports AES-GCM encryption.

There are two ways to use encryption:

1. Use EncryptedJSONStorage directly

from asynctinydb import EncryptedJSONStorage, TinyDB

async def main():
    db = TinyDB("db.json", key="your key goes here", storage=EncryptedJSONStorage)

2. Use Modifier class

See Encryption

Isolation Level

When operating the TinyDB concurrently, there might be racing conditions.

Set a higher isolation level to mitigate this problem.

db.isolevel = 1

isolevel:

  1. No isolation, best performance.
  2. Serialised(Atomic) CRUD operations. (Also ensures thread safety) (default)
  3. Deepcopy documents on insertion and retrieving. (CRUD) (Ensures Index & Query Cache consistency)

DB-level caching

DB-level caching improves performance dramatically.

However, this may cause data inconsistency between Storage and TinyDB if the file that Storage referred to is been shared.

To disable it:

db = TinyDB("./path", no_dbcache=True)

Example Codes:

Simple One

import asyncio
from asynctinydb import TinyDB, Query

async def main():
    db = TinyDB('test.json')
    await db.insert({"answer": 42})
    print(await db.search(Query().answer == 42))  # >>> [{'answer': 42}] 

asyncio.run(main())

Event Hooks Example

async def main():
    db = TinyDB('test.json')

    @db.storage.on.write.pre
    async def mul(ev: str, s: Storage, data: dict):
        data["_default"]["1"]['answer'] *= 2  # directly manipulate on data

    @db.storage.on.write.post
    async def _print(ev, s, anystr):
      	print(anystr)  # print json dumped string
 
    await db.insert({"answer": 21})  # insert() will trigger both write events
    await db.close()
    # Reload
    db = TinyDB('test.json')
    print(await db.search(Query().answer == 42))  # >>> [{'answer': 42}] 

Customise ID Class

Inherit from BaseID and implement the following methods, and then you are good to go.

from asynctinydb import BaseID

class MyID(BaseID):
  def __init__(self, value: Any):
        """
        You should be able to convert str into MyID instance if you want to use JSONStorage.
        """

    def __str__(self) -> str:
        """
        Optional.
        It should be implemented if you want to use JSONStorage.
        """

    def __hash__(self) -> int:
        ...

    def __eq__(self, other: object) -> bool:
        ...

    @classmethod
    def next_id(cls, table: Table, keys) -> MyID:
        """
        It should return a unique ID.
        """

    @classmethod
    def mark_existed(cls, table: Table, new_id):
        """
        Marks an ID as existing; the same ID shouldn't be generated by next_id.
        """

    @classmethod
    def clear_cache(cls, table: Table):
        """
        Clear cache of existing IDs, if such cache exists.
        """

Customise Document Class

from asynctinydb import BaseDocument

class MyDoc(BaseDocument):
  """
  I am too lazy to write those necessary methods.
  """

Anyways, a BaseDocument class looks like this:

class BaseDocument(Mapping[IDVar, Any]):
    @property
    @abstractmethod
    def doc_id(self) -> IDVar:
        raise NotImplementedError()

    @doc_id.setter
    def doc_id(self, value: IDVar):
        raise NotImplementedError()

Make sure you have implemented all the methods required by BaseDocument class.

Dev-Changes

  • Storage closed property: Original TinyDB won't raise exceptions when operating on a closed file. Now the property closed of Storage classes is required to be implemented56.
  • Storage data converting: The responsibility of converting the data to the correct type is transferred to the Storage7
  • is_cacheable method in QueryInstance is changed to cacheable property and will be deprecated.

Misc

  • Lazy-load: File loading & dirs creating are delayed to the first IO operation.
  • CachingMiddleWare: WRITE_CACHE_SIZE is now instance-specific.
    Example: TinyDB("test.db", storage=CachingMiddleWare(JSONStorage, 1024))
  • search accepts optional cond, returns all docs if no arguments are provided
  • get and contains raises ValueError instead of RuntimeError when cond and doc_id are both None
  • LRUCache stores tuples of ids instead of lists of docs
  • search and get treat doc_id and doc_ids as extra conditions instead of ignoring conditions when IDs are provided. That is to say, when cond and doc_id(s) are passed, they return docs satisfies cond and is in doc_id(s).

Footnotes

  1. Why not orjson? Because ujson is fast enough and has more features.

  2. See DB-level caching to learn how to disable this feature if it causes dirty reads.

  3. See isolevel

  4. Currently using UUID4

  5. This is for Middleware classes to reliably determine whether the Storage is closed, so they can raise IOError

  6. An IOError should be raised when operating on closed storage.

  7. e.g. JSONStorage needs to convert the keys to str by itself.