Solution for the Arena Code Challenge.
Either from the command line:
lein run <filename> <threshold>
I didn't get fancy with the cli processing, to make things easy the code expects the filename of a file in the resource path:
lein run Artist_lists_small.txt 50
Alternatively jacking-in to the repl:
;; base-file is included as a def for convenience and is bound to the provided file
(run-search base-file 50)
Output goes to standard out to be redirected into a file if desired.
The threshold value can be adjusted down to around 10 for reasonable performance with the current data set. If the set were larger or a lower threshold is going to be provided a different approach might be necessary.
The approach is pretty straightforward. The only "optimization" I made was to remove artists that appear on fewer lists than the threshold since they can't meet the critera.
My initial quick cut at the solution just used a for
list comprehension to get
the cross product of all artists. That worked but required a lot more
post-processing cleanup and made the code more verbose than necessary. I
switched that to a recursive function.
I did choose to use uuids as the set identifiers, which may slow things down slightly when calculating the intersection of sets of lists.
Typically my process is repl driven iterative development. For non-trivial functions I turn those repl calls into a test once I'm satisfied. This went quickly enough that the process of writing tests seemed like overkill.
Since it's a code test I did include a few tests just to show I know how it's done.
Tests can either be run from the cli as normal:
lein test