hg/sample perf issues with larger schemas
coopernurse opened this issue · 4 comments
coopernurse commented
Hi there,
Thanks for writing herbert. The test.check integration is really nice.
While using this today I noticed some perf issues in the sample
function. It looks like performance degrades exponentially. If you have a schema with 20 fields, sample
never completes (at least on my Macbook). Here are some examples:
;; 8 fields
(def schema8 '[{:heatingSystem? str,
:propertyCity? str,
:mlsListingId? str,
:listingKey {:externalAppId int, :listingId str},
:yearBuilt? str,
:propertyCountry? str,
:imageUrls? (seq (* str)),
:userId str}])
;; 15 fields
(def schema15 '[{:heatingSystem? str,
:propertyCity? str,
:mlsListingId? str,
:listingKey {:externalAppId int, :listingId str},
:yearBuilt? str,
:propertyCountry? str,
:listDate? str,
:county? str,
:attic? str,
:secondAgentName? str,
:diningRoom? str,
:baths? str,
:secondAgentPhone1? str,
:imageUrls? (seq (* str)),
:userId str}])
;; 20 fields
(def schema20 '[{:heatingSystem? str,
:monthlyHOAFees? str,
:remarks? str,
:roofType? str,
:listingSource? str,
:propertyCity? str,
:bedrooms? str,
:mlsListingId? str,
:listingKey {:externalAppId int, :listingId str},
:yearBuilt? str,
:propertyCountry? str,
:listDate? str,
:county? str,
:attic? str,
:secondAgentName? str,
:diningRoom? str,
:baths? str,
:secondAgentPhone1? str,
:imageUrls? (seq (* str)),
:userId str}])
Then when run in the REPL:
user=> (time (def x (hg/sample schema8)))
"Elapsed time: 4.203 msecs"
user=> (time (def x (hg/sample schema15)))
"Elapsed time: 543.325 msecs"
;; never finishes - crushes CPU
user=> (time (def x (hg/sample schema20)))
miner commented
I'll have to investigate. It seems that the optional keys are causing the slowdown.
miner commented
It appears to be a combinatorially explosion in mk-literal-hash-map. I'll have to rewrite that.
miner commented
Please try version 0.6.5, just released on Clojars.
coopernurse commented
Thank you sir! Looking much better.