JSON node crawler; traverse JSON trees; visit each node; yield results.
pip install jsoncrawl
JSON node visitor.
Yields results of a user provided process_node funct.
- The process node function accepts Nodes; Nodes are 3-D named tuples:
- The node's path; a list of JSON object keys/indexes.
- The node's value.
- The node's JSON data type (i.e. object, array, string, number, etc.)
- Options:
- Option to crawl through JSON objects.
- Option to crawl through JSON arrays.
- Option to use wildcard characters to substitute for index numbers.
def node_visitor(d, process_node, objects=True, arrays=False, element_ch=None):
"""Call process_node funct for every node in tree and yield results."""
- d (obj):
- Data to traverse (the tree)
- process_node (funct):
- Accepts a Node as an argument. Example: lambda node: '.'.join(map(str, node.keys))
- objects (bool):
- Visit the items in an object?
- arrays (bool):
- Visit the elements in arrays? Automatically set if element_ch.
- element_char (str):
- Replaces sequence index numbers with this character when set; if not visit_arrays then ignore this option.
first_node = Node(keys=[], val=d, dtype=get_type(d))
to_crawl = deque([first_node])
while to_crawl:
node = to_crawl.popleft()
yield process_node(node)
children = get_children(node, element_ch, objects, arrays)
to_crawl.extend(children)
"""Print a list of unique keys in a JSON document.""
from jsoncrawl import node_visitor
def process_node(node):
return '.'.join(map(str, node.keys))
data = {'key1': 'test1', {'key2': 'test2'}}
for unique_key in set(node_visitor(data, process_node, element_ch='*')):
print(unique_key)