msiemens/tinydb

Allow datetime objects in TinyDB

EmilStenstrom opened this issue ยท 14 comments

Since JSON is the default serilization format for TinyDB there's the datetime problem:

>>> from tinydb import TinyDB
>>> from datetime import datetime
>>> db = TinyDB("db.json")
>>> db.insert({"date": datetime.now()})
...
TypeError: datetime.datetime(2015, 2, 21, 17, 24, 17, 828569) is not JSON serializable

Many other databases handle datetime conversion for the user, and I would very much like TinyDB to do the same (It's usually fixed by specifying a custom encoding and a corresponding decoder when reading from the database).

Do you think this is a good idea?

I defenitely think it's useful but I'm not sure how to implement this best. I wouldn't make this a default, though, because that would break usages where a conversion is not expected. Instead one could make use of TinyDB's extensibility. At the moment I can think of these options:

  • Create a custom storage (copying the source of JSONStorage) that uses a custom JSON encoder/decoder
  • Implement the encoder/decoder as a middleware. It would scan the data for datetime objects when writing and for date-like strings when reading and convert between these two

Thanks for your quick reply!

Since it's not possible to insert datetime objects right now, adding an encoder wouldn't break any existing usages. There could be issues decoding, but only if the encoded format clashes with something that someone already uses. Maybe a string like "TinyDate(2015-01-01T01:01:01)" would be sufficiently unique to make sure existing applications continue working.

I think this is something that's worthwhile to have in "core" since it allows simple date fetching methods to be added to TinyDB. Adding methods like: db.search(month("timestamp") == 2) looks like a great fit for TinyDB I think, and as far as I understand you would need to control the datetime serialization to get that to work.

Meanwhile, I think I will do the encoding/decoding in the application layer.

I think this is something that's worthwhile to have in "core"

Basically this is a design question. I agree that this feature will be useful but from a design perspective I would prefer this to be an extension (something like tinydb-datetime). The main reason is that IMO the core should be kept as small as possible while offering comprehensive support for extensions of all kinds.

Maybe we could add some kind of support for custom encoders/decoders to TinyDB so users can use the extension without hassle. This could look like this:

# in the extension
class DatetimeSerializer(Serializer):
    name = 'TinyDate'
    obj_class = datetime

    def encode(self, obj):
        # ...

    def decode(self, s):
        # ...

# in the user's code
db.register_serializer(DatetimeSerializer)

What do you think about this option?

๐Ÿ‘ for keeping it out of core. Currently the core is quite packed already, with quite complex object interaction here and there. Perhaps a more contrived option for storing dates would be to store them as seconds since epoch (see time.time). Then no middleware is technically required and they can still be sorted/compared even without the datetime tools.

I fully agree that this is a design decision, and I see how you want to keep core small. I fully respect your decision to keep it that way.

My thinking on why this was a "core" feature was because of the data type. I expect a list stored in TinyDB to show up as a list. And I expected a datetime stored in TinyDB to show up as a datetime. So this really has to do with what input formats you decide to support. The reason datetime isn't defined in JSON is because it's not considered "core" enough, so I see how you might feel the same way.

That said, I would appreciated a simple way to define my own encoder/decoder. Except for handling datetimes I have found myself wanting to open the db.json field and inspect it visually. Adding indent=4 to the encoder would help with that too.

So I'm fine with either defining a simpler encoder interface, or simply expanding the documentation to explain how to substitute the encoder. For instance, do I miss the speedup from ujson if I define my own?

If you still want the benefits of ujson I suggest that you convert the integers/strings to datetime objects after you deserialise it and before you serialise it. Also you should consider adding a wrapper class that caches the datetime value from a string/integer on top of the core data types so you don't have to do extra work when serialising. I think there's also an object_hook for ujson is there not?

So I'm fine with either defining a simpler encoder interface, or simply expanding the documentation to explain how to substitute the encoder.

I've already planned to revise the extension docs. Currently they are very sparse and custom table classes aren't even documented yet. Do you think it's creating a more comprehensive tutorial for custom storages/encoders?

I'll also think about custom serialization. At the moment I think it would be in line with the current extensibility features, but I feel like I should first do some cleanup on the current code/docs.

I've already planned to revise the extension docs. Currently they are very sparse and custom table classes aren't even documented yet. Do you think it's creating a more comprehensive tutorial for custom storages/encoders?

I actually think the docs are in very good shape. I found it easy to pick TinyDB up in an evening. That said, there are always things to improve:

  • Difference between "Basic Usage" and "Advanced usage" could go away I think. I'd rather have one reference on all the ways to access the API than having to jump back and forth between them. Maybe simplify "Basic Usage" even further and call it "Getting started", and have the advanced usage be a full reference.
  • I would add some examples to https://tinydb.readthedocs.org/en/latest/extend.html#write-a-custom-storage. I just looked at JSONStorage and it's REALLY easy to customize :) I'll just replace the loads and dumps with something else.

I think there's also an object_hook for ujson is there not?

Seems not. So I will have to serialize before saving it to db and regexp on the way out.

Try simplejson- it offers speedups from the regular json library and also gives you streaming transformations on your data (object_hook).

Closing this as #50 has been merged :)

I've now released v2.3.0 which features custom serialization :)

Yay! Very nicely done!

I am interesting to understand how do request with datetime.
Where can we find some usecase of query or insertion?
Thanks!

Sure! There's an example how to use datetime objects at the serialization repo: https://github.com/msiemens/tinydb-serialization. Remember to pip install tinydb_serialization before running them.