mhgrove/Empire

Performance issues with lazy loading

mkrech10 opened this issue · 11 comments

I seem to be having performance issues when pulling in a list of objects (Frameworks) and each of these objects has children, grand-children, etc. Currently the levels don't go too deep, but maybe like 8-10 levels deep and maybe 50 Frameworks.
It is taking a long time for just the list of parent frameworks to be retrieved. I know that proxy objects are created for each linked object so Empire knows where/how to retrieve the object when needed.

Is there a configuration to help speed this up? Could I be using it wrong?

Each retrieval would be a round trip to the server to get each object. You can specify that they should be lazily loaded via your annotations which might help with the perceived performance.

Sorry, I should have said that in the initial comment. We are currently using fetch = FetchType.LAZY

how long does it take to retrieve the list of parent objects? what database are you using?

We are currently using Stardog. There are almost 1000 parents. I ran a quick comparison test. 5 find alls with using empire and five using a manual sparql query
191,925ms
169,967ms
192,158ms
191,791ms
183,041ms

When running with just sparql we got:
312ms
187ms
187ms
177ms
179ms

Here is the find all method that used empire

List frameworks = []
Query findAllFrameworks = entityManager.createQuery("where { ?result rdf:type <$UnifiedFrameworkOntology.BASE_URI#Framework> }")
findAllFrameworks.setHint(RdfQuery.HINT_ENTITY_CLASS, Framework)
List results = findAllFrameworks.resultList
results.each {
frameworks.add((Framework)it)
}

if you attach visualvm's profiler to the process where you're going through empire, what are the hotspots?

Let me get that set up and I'll get back to you.

I cannot get VisualVM to connect to my application server currently. I am working on it. I have the VisualVM plugin for IntelliJ but yet I keep getting the errors below.

image

image

UPDATE:
I was looking into the wrong %tmp% directory. I went to c:\users\AppData\Local\Temp and was able to find the hsperfdata_UserName directory and remove it

I have added my nps file from visualVM for reference if you'd like

The bulk of the time are in com.clarkparsia.empire.annotation.RdfGenerator.fromRdf() and in com.clarkparsia.empire.annotation.RdfGenerator.determineClass()

The inital call to get all the frameworks works rather quickly. It's only when we iterate through the result list to build a List of Frameworks that kills our performance. Each framework object is placed into the list, but it appears as though each children's proxy object is created at this point.

The properties of a Framework are id, name, and list of TaskItem. All we want in the findAllFrameworks is to return the id and name. We do not need any of the TaskItems. And when we are iterating through the returned result set, we are not asking for any specific properties, just frameworkList.add(framework from iterator).

FindAllFrameworks.nps.txt

fromRdf and determineClass are the hotspots because both of them round-trip to the database and would be called often when the list of Framework objects is built.

I'm not sure why it would eagerly load the task items for all 1000 frameworks, are all the relevant properties annotated w/ lazy fetch type?

You could always create a parent interface that has just the id and name of the Framework and you could load the frameworks as those.

The issue we were having was not related to eager/lazy loading directly. Our code is written in Groovy. When using Groovy we had to set all related object to be created as 'private', rather than default. This then provided the behavior that we expected when setting the fetch type to lazy.

@mhgrove this issue can probably be closed. @mkrech10 is on my team and we agreed that the other issues I submitted today probably covers what we were seeing with more specificity.