Other: Also use persistent RDFlib store for output graphs
Opened this issue · 1 comments
Once a graph has been built, it may be useful to also import the resulting .owl
file into an RDFlib persistent store. Use of a persistent store allows for the graph to be accessed using RDFlib without having to import the entire structure into memory, which may be advantageous when working with large graphs. Below is a sample implementation that uses the Berkeley Database as a persistent backend. RDFlib has built-in support for this particular backend. Note that Berkeley DB was formerly developed by Sleepycat Software, hence the use of "Sleepycat" as the backend name when creating the Graph
object.
import rdflib
# The persistent store requires an identifier
graph_id = rdflib.URIRef(identifier)
# Open the graph with the "Sleepycat" Berkeley DB Backend
graph = rdflib.Graph("Sleepycat", identifier=graph_id)
# Open the graph and create it if it doesn't exist
graph.open(uri, create=True)
# Parse the graph at 'graph_path', typically XML formatted
# This could take many hours if the graph is large
graph.parse(graph_path)
# Close the graph to free resources. Mostly unneccessary due
# to the small overhead of the on-disk store
graph.close()
Alternatively, the following code wraps the above functionality in a context manager, allowing the graph to be managed inside of a with
block for convenience:
from contextlib import contextmanager
import rdflib
@contextmanager
def open_persistent_graph(uri, identifier, graph_path=None):
"""Provides a context manager for working with an OWL graph while also
automatically closing it afterward. URI is the location of the
graph store directory and IDENTIFIER is the name of the graph
within that store. Optional argument GRAPH_PATH specifies an
appropriately formatted RDF file to import when opening the graph.
"""
try:
# Only force create if a path is provided
create_graph = bool(graph_path)
# Open and load the on-disk store
graph_id = rdflib.URIRef(identifier)
graph = rdflib.Graph("Sleepycat", identifier=graph_id)
graph.open(uri, create=create_graph)
# Parse the file at GRAPH_PATH if set
if graph_path:
graph.parse(graph_path)
yield graph
finally:
graph.close()
Thanks so much @zmaas! This is great. I will plan to leave this issue active until we can address it during the rebuild next month. Assuming it's OK with you, I will circle back to you when we are in the re-implementation stage?