/yserial

NoSQL y_serial Python module – warehouse compressed objects with SQLite

Primary LanguagePythonOtherNOASSERTION

README for yserial

Join the chat at https://gitter.im/rsvp/yserial

TL;DR single module file: yserial = serialization + persistance

In a few lines of Python code, compress and annotate Python objects into SQLite; then later retrieve them chronologically by keywords without any SQL. Highly useful NoSQL "standard" module for a database to store schema-less data.

It is based on key/value where the conceptual key is

 filename + table_name + primary_key + timestamp + notes

and the value is any reasonable object. We generally mean Python objects, but we include support for files (binary, image, etc.) and URL content (e.g. webpages). Python objects are strings, dictionaries, lists, tuples, classes, and instances. Objects are inserted into a hierarchy: database file, table within that database, row within table. Moreover, each object is annotated for reference by "notes" and automatically timestamped.

You are spared from explicitly spelling out many of the difficult protocol details: cursor/connection, SQL/DB-API implementation, serialization, compression, search algorithm, etc. -- for these are optimized to interact with speed, security, and concurrency -- yet handled transparently. And our module is faster than comparable approaches under PostgreSQL.

We highly recommend SQLite because it requires neither separate installation nor a server process; also, it uses single normal files (easy to backup or send), not an elaborate filesystem. Moreover, in comparison to similar applications with MySQL or PostgreSQL, SQLite is extremely fast and suits most purposes wonderfully. [The computing center at Harvard's math department asserts that yserial "provides a very reliable NoSQL interface for SQLite," see http://www.math.harvard.edu/computing/sqlite ]

The means for insertion, organization by annotation, and finally retrieval are designed to be simple to use. The notes effectively label the objects placed into the database. We can then later query the database, for example, by regex (regular expression) searching on notes, and placing the qualified objects in a dictionary. The keys of this dictionary correspond to the unique primary keys used in the database. We can thus use Python code to process the contents of this qualified dictionary, in effect, a data subset. If the objects in that dictionary are themselves dictionaries we are essentially dealing with schema-less data.

Other useful methods are available:

  • insert any external file (via infile). This is handy for working with thousands of image files.
  • insert anything on the web by URL (via inweb).
  • insert in batches by generator (via ingenerator). This can be used to rapidly store a series of computationally intense results for quick retrieval at a later time.

Installation: simply a single file

The latest development version of the module is y_serial_dev.py, whereas recent stable versions can be found under the release directory. There are no dependencies, other than standard issue Python modules.

     $ curl -kLO  https://git.io/y_serial_dev.py

REQUIREMENT: Python version 2.x where x is 5 or greater. Copy or symlink the y_serial module to where your Python can find it.

Documentation

The module includes the tutorial documentation within itself. And the source code contains verbose comments for developers. Our wiki has some useful tips.

But first checkout the ten-minute HOWTO tutorial at http://nbviewer.ipython.org/urls/git.io/yserial-HOWTO.ipynb

Contributing to yserial repository

Details are covered in CONTRIBUTING.md (which should appear when making a pull request). All feedback is welcome.

For real-time discussions, please go to:

Testing

Tests are contained within the module itself. The default database file db0 assigned in class Base presumes Linux top directory /tmp (change to suit your system) -- yserial is designed to operate cross-platform including Windows.

     import y_serial_dev as y_serial
     y_serial.tester()
     #        ^for the principal class Main
     #        testfarm is for the beta version, not yet in Main.
     #   Flip the DEBUG variable for verbose results. 

Memorable current links

Brief development history

Thanks so much to all who participated in this project over these long years. We truly appreciate your wonderful collaboration in developing our code. Acknowledgements