/mict

middle-ground between `dict` and class for research-time data representation.

Primary LanguageJupyter NotebookOtherNOASSERTION

mict -- middle-ground between dict and a class

Binder

Provides MATLAB-struct-like dot notation for setting and of items in a dict (dictionary), and a handful of interactivity tools.

mict is intended to be a middle ground between dict and full fledged class / object pattern for structured data storage.

It does a bit more than basic dict, but does not attempt to supersede pandas, numpy, xarray nor other advanced data storage tools

WARNINGS

This is a research and exploration time helper. NOT INTENDED FOR PRODUCTION CODE. Contains some quirks and promotes poor practices that will generally lower the quality of your production-time code. You have been advised.

[TOC]

Installation

Try installing with pip, grabbing it straight from github:

python -m pip install -U git+https://github.com/jerzydziewierz/mict.git#egg=mict

Note: the #egg=commontools provides better compatibility in case if you want to use mict in your own package as a dependency.

To uninstall, the symmetric command is:

python -m pip uninstall mict

Otherwise, for a development mode editable local installation, try cloning the files from the repo to your favourite folder, and then try, from inside the root folder of the repo (where the setup.py is )

python -m pip -e .

This package intends to use PEP 517 and it's implementation, setuptools Quickstart — setuptools 54.2.0 documentation

Demo

Online demo with binder:

Binder

local demo: please see demo.ipynb -- a jupyter notebook.

Basic usage

Always remember that mict inherits from dict and hence, all the operations that are valid for dict are also valid for mict. On top of that, new operations are available.

from mict import mict
q=mict()
q.first = 'Hello world!'
q

keyvalue
first Hello world!

The first power of mict is the ease of adding, removing, altering the contents -- and the the nice visualisation of them.

You can add new keys easily, and then access them using dot-notation:

q.second = 'not'
q.third = 3
q

keyvalue
first Hello world!
second not
third 3

The visualiser function in mict is called reprstyler.

A new mict instance comes with the reprstyler_html already set to reprstyler_basic_html but you can, and should make your own reprstylers -- see below on how to do that.

Convenience save and load functions using pickle

Warning: Since the pickle save/load also loads executable functions (including the reprstyler), all the precautions that apply to pickle apply here. Loading untrusted files is potentially unsafe. See https://github.com/yk/patch-torch-save and the explanation of how it works: https://www.youtube.com/watch?v=2ethDz9KnLk

methods (instance).to_pickle() and (class).from_pickle() work as you would expect:

q.to_pickle('demo.pickle')
True
q1=mict.from_pickle('demo.pickle')
q1

keyvalue
first Hello world!
second not
third 3

Casting to pure dict

mict under the hood is really a dict with some extra handlers

dict(q)
{'reprstyler_html': <function mict.reprstylers_generic.reprstyler_basic_html(subject=None)>,
 'first': 'Hello world!',
 'second': 'not',
 'third': 3}

Customizing the reprstyler

This is the real reason why I developed mict, and a primary value proposition. mict provides a quick and simple way to customize how the contents of the dictionary are visualized:

from mict import mict
def baldstyler(subject):
    return f'baldness score: {subject.bald:0.3f}'

q=mict(zap=None,bald=6.28,hidden_value1=45,reprstyler_html=None,reprstyler=baldstyler)

q
baldness score: 6.280

As everything in mict, the reprstyler can and should be changed after initializing the mict storage itself (I know that purists will really begin to spin in their graves now!) :

def new_reprstyler(subject=None):
    txt = 'keys:  '
    for key in subject.keys():
        if key == 'reprstyler' or key == 'reprstyler_html':  # do not list reprstyler
            continue
        txt = f"{txt} {key}={subject[key]};"
    return txt

q.reprstyler = new_reprstyler
q
keys:   zap=None; bald=6.28; hidden_value1=45;
def baldstyler_html(subject):
    return f'<h4>✅high baldness score: {subject.bald:0.3f}</h4>' if subject.bald>5 else f'<h4>❌not bald enough: {subject.bald:0.3f}</h4>'
q.reprstyler_html = baldstyler_html
q
✅high baldness score: 6.280
q.bald = 3.1
q
❌not bald enough: 3.100

An even quicker way to make a reprstyler

You can use lambda anonymous function definition to make a reprstyler quickly.

Note that reprstyler_html will display any html you give it, including images, videos, sound or javascript inserts

q=mict(fill='blue',r=15)
q.reprstyler_html=lambda self:f'<svg width="200" height="100" ><circle cx="50" cy="50" r="{self.r}" fill="{self.fill}" /></svg>'
q

image-20210328194624321

q.r=32
q.fill='red'
q

image-20210328194642747

## Remove the `reprstyler` and use the default `dict` styler -- while preserving other functionality.
q.reprstyler_html=None
try:
    q.pop('reprstyler_html')
except:
    pass
q

{'fill': 'red', 'r': 32}

If you want a different styler for jupyter, and a different one for console-only, you can (optionally) override the reprstyler rather than reprstyler_html. The latter will only be used if _repr_html_ is called -- as jupyter first tries to call that first; the former is the default called by the console-only IPython

image-20210328192335485

Other output formats

If you really feel like it, self could be extended to markdown, png, svg and other visualizers supported by jupyter.

You can still see the classic dict repr function (lists all keys/values) using

super(mict,q).__repr__()

Advanced uses

Show the shape of big variables, instead of their content

Note: for an even more lovely way of displaying tensors, see Lovely-Tensors -- I might want merge that (or parts of it) into here at some point.

Much of the actual research code will use numpy arrays or long lists. These are typically unwieldy to just display as-is.

mict provides a default reprstyler, reprstyler_basic_html which has a way to only display the shape of the numpy/tensorflow/pytorch/jax array, instead of the contents. You will find that self is often what you need displayed instead of the regular dandruff.

from mict import mict
import numpy
q=mict(small_array=numpy.array([1,2,3,4,5]),reprstyler_html=None) # reprstyler_html is enabled by default to `reprstyler_basic_html`.
q
{'small_array': array([1, 2, 3, 4, 5]), 'reprstyler_html': None}
from mict import mict
from mict import reprstyler_basic_html
import numpy

q.large_array=numpy.random.random((150,150))
q.reprstyler_html = reprstyler_basic_html  # set back to the provided function
q

keyvalue
small_array np.array(shape=(5,))
large_array np.array(shape=(150, 150))

Capture locals from inside the function and return them in a dict / mict

When developing a research function, you will often want to capture all the locals inside it for debug purposes.

Only when stabilizing the implementation, you will want to prune the result and leave usefull return values only.

from mict import mict
# define a function
def do_maths(x=1,y=2):
    a=x+y
    b=a*x
    c=b*y
    result = mict.from_locals()  # magic! puts `x`,`y`,`a`,`b`,`c` into `result`. 
    result.pop('b') # optionally, remove 'b' from the result
    return result

# execute that function
demo_result = do_maths(x=4,y=2)

# review function's locals
demo_result

keyvalue
x 4
y 2
a 6
c 48

Nested dictionary use and special attributes

note that name and type have a special meaning for the default visualizer, reprstyler_basic_html.

Moreover, the reprstyler_basic_html will try to obtain html from inner (nested) micts and display it in a table.

Note that reprstyler_basic_html is merely another function from the mict module, and can be overridden with any other reprstyler. If reprstyler_html is set to None, the default dict.__repr__() is used.

from mict import mict
x=mict(type='x-coordinate', value=3)
y=mict(type='y-coordinate', value=5)
point=mict(name='example', x=x,y=y)
x
Type: x-coordinate;
keyvalue
value 3
y
Type: y-coordinate;
keyvalue
value 5
point
Name: example;
keyvalue
xType: x-coordinate;
keyvalue
value 3
yType: y-coordinate;
keyvalue
value 5
# you can access nested dictionaries
print(point.x.value)
print(point.x.type)
3
x-coordinate
# `mict` inherits all methods from `dict`
for key in point.keys():
    print(key)
name
x
y
reprstyler_html
point.pop('name')
point.pop('x')
Type: x-coordinate;
keyvalue
value 3
point

keyvalue
yType: y-coordinate;
keyvalue
value 5
# display what actual HTML is generated by the reprstyler
v=point._repr_html_()
v

.

'<br/><table><tr><th>key</th><th>value</th></tr><tr><td>y</td><td><em>Type:</em> y-coordinate; <br/><table><tr><th>key</th><th>value</th></tr><tr><td>value</td> <td> 5</td> </tr>  </table></td> </tr>  </table>'

.

# reset the reprstyler to not generate anything.
point.reprstyler_html=None
point
{'y': {'type': 'y-coordinate', 'value': 5, 'reprstyler_html': <function reprstyler_basic_html at 0x0000017980D26310>}, 'reprstyler_html': None}

.

# point is still both `mict` and `dict` -- `mict` inherits from `dict`.
isinstance(point,mict)
True
isinstance(point,dict)
True

More examples

from mict import mict
import numpy
import math
q=mict(title="some title",subtitle="some subtitle",interesting_integer = 3,interesting_float = math.tau,  big_array=numpy.random.random((200,250)))
q

keyvalue
title some title
subtitle some subtitle
interesting_integer 3
interesting_float 6.283185307179586
big_array np.array(shape=(200, 250))
def custom_html_styler(self):
    out = f'<h1>{self.title}</h1>'
    out = f'{out}<h2>{self.subtitle}</h2>'
    out = f'{out}<p>interesting integer:{self.interesting_integer:04d}</p>'
    out = f'{out}<p>interesting float: {self.interesting_float:0.{self.interesting_integer}f}</p>'
    out = f'{out}<p>some stats: {self.big_array.std()=:0.4f}</p>'
    out = f'{out}<hr/>'
    return out

q.reprstyler_html = custom_html_styler
q

some title

some subtitle

interesting integer:0003

interesting float:6.283

some stats: self.big_array.std()=0.2897


Gotchas

  • mict does not throw an error when trying to access undefined field. Instead, it returns None. I bet that the opinion will be divided on self behaviour.
  • dict keys that contain a dot, cannot be used to access the field in dot-notation mode.

For example:

q=mict()
q['a.b']=3
q.a.b  # UserWarning and AttributeError - no key "a" in self dictionary.

To-Dos

  • Add testing support functionality
    • are two mict nearly equal? how equal they are?
  • Add set operations
  • Decide on what to do when requested element is not in the mict
    • Currently, when the requested element is not in mict, it returns a warning and a None. This might not suit the taste of many people.
  • More examples on typical usage
    • usage in the context of pandas, numpy/jax e.t.c.

Attributions

Happily copypasted from https://stackoverflow.com/questions/2352181/how-to-use-a-dot-to-access-members-of-dictionary , and modified only slightly.

Then, extended a bit.

Then, a bit more, with optional reprstyler.

See the source code for self.__repr__().

Related packages

see also:

https://pypi.org/project/python-box/

License

Modded MIT License, Copyright (c) 2015-2022 George "Dr Jerzy Dziewierz" Rey. See LICENSE file.

Accolades

image-20210329074600460