Performance issues with lazy loading
mkrech10 opened this issue · 11 comments
I seem to be having performance issues when pulling in a list of objects (Frameworks) and each of these objects has children, grand-children, etc. Currently the levels don't go too deep, but maybe like 8-10 levels deep and maybe 50 Frameworks.
It is taking a long time for just the list of parent frameworks to be retrieved. I know that proxy objects are created for each linked object so Empire knows where/how to retrieve the object when needed.
Is there a configuration to help speed this up? Could I be using it wrong?
Each retrieval would be a round trip to the server to get each object. You can specify that they should be lazily loaded via your annotations which might help with the perceived performance.
Sorry, I should have said that in the initial comment. We are currently using fetch = FetchType.LAZY
how long does it take to retrieve the list of parent objects? what database are you using?
We are currently using Stardog. There are almost 1000 parents. I ran a quick comparison test. 5 find alls with using empire and five using a manual sparql query
191,925ms
169,967ms
192,158ms
191,791ms
183,041ms
When running with just sparql we got:
312ms
187ms
187ms
177ms
179ms
Here is the find all method that used empire
List frameworks = []
Query findAllFrameworks = entityManager.createQuery("where { ?result rdf:type <$UnifiedFrameworkOntology.BASE_URI#Framework> }")
findAllFrameworks.setHint(RdfQuery.HINT_ENTITY_CLASS, Framework)
List results = findAllFrameworks.resultList
results.each {
frameworks.add((Framework)it)
}
if you attach visualvm's profiler to the process where you're going through empire, what are the hotspots?
Let me get that set up and I'll get back to you.
I cannot get VisualVM to connect to my application server currently. I am working on it. I have the VisualVM plugin for IntelliJ but yet I keep getting the errors below.
UPDATE:
I was looking into the wrong %tmp% directory. I went to c:\users\AppData\Local\Temp and was able to find the hsperfdata_UserName directory and remove it
I have added my nps file from visualVM for reference if you'd like
The bulk of the time are in com.clarkparsia.empire.annotation.RdfGenerator.fromRdf() and in com.clarkparsia.empire.annotation.RdfGenerator.determineClass()
The inital call to get all the frameworks works rather quickly. It's only when we iterate through the result list to build a List of Frameworks that kills our performance. Each framework object is placed into the list, but it appears as though each children's proxy object is created at this point.
The properties of a Framework are id, name, and list of TaskItem. All we want in the findAllFrameworks is to return the id and name. We do not need any of the TaskItems. And when we are iterating through the returned result set, we are not asking for any specific properties, just frameworkList.add(framework from iterator).
fromRdf
and determineClass
are the hotspots because both of them round-trip to the database and would be called often when the list of Framework objects is built.
I'm not sure why it would eagerly load the task items for all 1000 frameworks, are all the relevant properties annotated w/ lazy fetch type?
You could always create a parent interface that has just the id and name of the Framework and you could load the frameworks as those.
The issue we were having was not related to eager/lazy loading directly. Our code is written in Groovy. When using Groovy we had to set all related object to be created as 'private', rather than default. This then provided the behavior that we expected when setting the fetch type to lazy.