Scalpl provides a lightweight wrapper that helps you to operate
on nested dictionaries seamlessly through the built-in dict
API, by using dot-separated string keys.
It's not a drop-in replacement for your dictionnaries, just syntactic
sugar to avoid this['annoying']['kind']['of']['things']
and
prefer['a.different.approach']
.
No conversion cost, a thin computation overhead: that's Scalpl in a nutshell.
There are a lot of good libraries to operate on nested dictionaries, such as Addict or Box , but if you give Scalpl a try, you will find it:
- 🚀 Powerful as the standard dict API
- ⚡ Lightweight
- 👌 Well tested
Scalpl is a Python3 library that you can install via pip
pip3 install scalpl
Scalpl provides a simple class named Cut that wraps around your dictionary
and handles operations on nested dict
and that can cut accross list
item.
This wrapper strictly follows the standard dict
API, which
means you can operate seamlessly on dict
,
collections.defaultdict
or collections.OrderedDict
by using their methods
with dot-separated keys.
Let's see what it looks like with an example ! 👇
from scalpl import Cut
data = {
'pokemon': [
{
'name': 'Bulbasaur',
'type': ['Grass', 'Poison'],
'category': 'Seed',
'ability': 'Overgrow'
},
{
'name': 'Charmander',
'type': 'Fire',
'category': 'Lizard',
'ability': 'Blaze',
},
{
'name': 'Squirtle',
'type': 'Water',
'category': 'Tiny Turtle',
'ability': 'Torrent',
}
],
'trainers': [
{
'name': 'Ash',
'hometown': 'Pallet Town'
}
]
}
# Just wrap your data, and you're ready to go deeper !
proxy = Cut(data)
You can use the built-in dict
API to access its values.
proxy['pokemon[0].name']
# 'Bulbasaur'
proxy.get('pokemon[1].sex', 'Unknown')
# 'Unknown'
'trainers[0].hometown' in proxy
# True
By default, Scalpl uses dot as a key separator, but you are free to use a different character that better suits your needs.
# You just have to provide one when you wrap your data.
proxy = Cut(data, sep='->')
# Yarrr!
proxy['pokemon[0]->name']
You can also easily create or update any key/value pair.
proxy['pokemon[1].weaknesses'] = ['Ground', 'Rock', 'Water']
proxy['pokemon[1].weaknesses']
# ['Ground', 'Rock', 'Water']
proxy.update({
'trainers[0].region': 'Kanto',
})
Following its purpose in the standard API, the setdefault method allows you to create any missing dictionary when you try to access a nested key.
proxy.setdefault('pokemon[2].moves.Scratch.power', 40)
# 40
And it is still possible to iterate over your data.
proxy.items()
# [('pokemon', [...]), ('trainers', [...])]
proxy.keys()
# ['pokemon', 'trainers']
proxy.values()
# [[...], [...]]
By the way, if you have to operate on a list of dictionaries, the
Cut.all
method is what you are looking for.
# Let's teach these pokemon some sick moves !
for pokemon in proxy.all('pokemon'):
pokemon.setdefault('moves.Scratch.power', 40)
Also, you can remove a specific or an arbitrary key/value pair.
proxy.pop('pokemon[0].category')
# 'Seed'
proxy.popitem()
# ('trainers', [...])
del proxy['pokemon[1].type']
Because Scalpl is only a wrapper around your data, it means you can get it back at will without any conversion cost. If you use an external API that operates on dictionary, it will just work.
import json
json.dumps(proxy.data)
# "{'pokemon': [...]}"
Finally, you can retrieve a shallow copy of the inner dictionary or remove all keys.
shallow_copy = proxy.copy()
proxy.clear()
This humble benchmark is an attempt to give you an overview of the performance
of Scalpl compared to Addict,
Box and the built-in dict
.
It will summarize the number of operations per second that each library is able to perform on a portion of the JSON dump of the Python subreddit main page.
You can run this benchmark on your machine with the following command:
python3 ./benchmarks/performance_comparison.py
Here are the results obtained on an AMD 5950x with Python 3.10.2.
Addict 2.4.0:
instantiate:-------- 773,292 ops per second. get:---------------- 1,197,604,715 ops per second. get through list:--- 647,249,139 ops per second. set:---------------- 130,434,783 ops per second. set through list:--- 91,240,875 ops per second.
Box 6.0.1:
instantiate:--------- 382,330 ops per second. get:----------------- 57,853,629 ops per second. get through list:---- 25,509,119 ops per second. set:----------------- 29,902,815 ops per second. set through list:---- 18,625,442 ops per second.
Scalpl latest:
instantiate:-------- 367,421,940 ops per second. get:---------------- 71,882,113 ops per second. get through list:--- 37,406,484 ops per second. set:---------------- 68,681,318 ops per second. set through list:--- 40,647,652 ops per second.
dict:
instantiate:--------- 612,870,255 ops per second. get:----------------- 1,675,977,835 ops per second. get through list:---- 746,268,647 ops per second. set:----------------- 1,388,888,605 ops per second. set through list :--- 798,934,707 ops per second.
As a conclusion and despite being an order of magniture slower than the built-in
dict
, Scalpl is faster than Box and Addict by an order of magnitude for any operations.
Besides, the gap increase in favor of Scalpl when wrapping large dictionaries.
Keeping in mind that this benchmark may vary depending on your use-case, it is very unlikely that Scalpl will become a bottleneck of your application.
- What if my keys contain dots ?
If your keys contain a lot of dots, you should use an other key separator when wrapping your data:
proxy = Cut(data, sep='->') proxy['computer->network->127.0.0.1']
Otherwise, split your key in two part:
proxy = Cut(data) proxy['computer.network']['127.0.0.1']
What if my keys contain spaces ?:
proxy = Cut(data) proxy['it works perfectly'] = 'fine'
Contributions are welcomed and anyone can feel free to submit a patch, report a bug or ask for a feature. Please open an issue first in order to encourage and keep tracks of potential discussions ✍️
Scalpl is released into the Public Domain. 🎉
Ps: If we meet some day, and you think this small stuff worths it, you can give me a beer, a coffee or a high-five in return: I would be really happy to share a moment with you ! 🍻