Hash Code 2019

Problem statement

The problem statement is included as problem statement.pdf.

Insights

Some things I found out about the datasets:

B has 80k pictures and ~840k unique tags
D has 90k pictures and 220 unique tags
E has 80k pictures and 500 unique tags
B: the interest factor between any two pictures is either 0 or 3 (-> at most 80,000 * 3 = 240,000 points)
B only has horizontal pictures
E only has vertical pictures
There is no tag that occurs only once in any dataset

A key idea used to speed up things for B and C is that if you have a slide and you are looking for the next one, you should only consider slides that have at least 1 tag in common. If they don't, then the interest factor will always be 0. The pictures that share a tag are called mates in the code.

Code Structure

There are 3 main files: Main, Strategies and Helper.

Helper: contains functions to generate data structures, calculate the score between two slides, the score of an entire slideshow, ...
Strategies: contains functions to a) merge vertical pictures into slides and b) 'sort'/arrange the slides
Main: contains the functions the read the input, write the output, to 'solve' once and finally a function that solves in an infinite loop (nice to run overnight).

Score

In the extended round, we got a total score of 1,048,874.

A - Example: 2
B - Lovely landscapes: 206,655
C - Memorable moments: 1,754
D - Pet pictures: 433,937
E - Shiny selfies: 406,526

Running the code

Just run Main.java, no external libraries required.

Improvements

Most of the heavy lifting would greatly benefit from being multithreaded
Feel free to open an issue/pull request with your ideas!

Zip

You can use ./zip to easily generate a zip file of the source code. Just a nice tool to have during the contest :) This is a simple bash script, so it should work on UNIX (Linux and Mac). For Windows, you'll want to install The Windows Subsystem for Linux.

JurriaanD/Hash-Code-2019