ilyakava/sup

A junction tree inspired approximation idea (PIT)

Opened this issue · 0 comments

Here is an algorithm that generates triplets in a better way:

for simplicity, start with K=1, N=10, f(n)=n, g(q,w)=w/q.

  1. construct an undirected graph
  • connect people who do not share a group membership, and have not met in the last K weeks
  • weight the edges with max(f(n), N) where n is the number of weeks since last meeting
  1. Form 3-cliques on the existing graph (group all groups of 3 nodes that are connected to each other)
  • use the old edge weights to store a value on each 3-clique node. Since 3 connected nodes a,b,c form 1 3-clique node, store the sums of the 3 edge weights on the 3-clique node, call this "meeting quality" q
  • the new edge weights will be the number of overlapping people between 3-cliques (1,2, or 3, no edge for 0)
  1. Prune this 3-clique graph until it is totally disconnected.
  • on each iteration remove the node with the highest total edge weights w
  • In the case of a tie between 3-clique nodes 1 and 2, compare g(q_1, w_1) and g(q_2, w_2)

The time dominating step is 2. It can be implemented in a straightforward way to be O(V^3) (V is vertices in the graph), which will be fast for companies < 200 (orders of magnitude faster than the current method too). The easy way is to construct an adjacency matrix representation of the graph, iterate through each entry (node1, node2), and for each nonzero entry check V more entries (node2,node3). Clever use of an adjacency list representation can speed this up if necessary since this graph is sparse.

Step 3 is O(V^2), and is easiest when the column sums of the adjacency matrix are stored separately.
Step 1 is O(V^2) too.

Huge speedups can be gotten by tweaking K which will make the graph more sparse as it increases. f(n) should be an increasing function of n and can communicate what a quality meeting looks like, N is for simplicity and not giving new members too much priority in creating meetings. g(q,w) should be increasing in w and decreasing in q, and can too communicate what a quality meeting looks like by prioritizing some meetings happening over others.

out

Here is an example showing the tradeoff that occurs. Imagine a situation where g hasn't met with e and c in C weeks (red edges) and all the other blue edges are K. At a certain point, we have to prioritize one meeting ceg happening over two meetings abc and def (eg. g is a new member of the company and all others are the founding group) and not just prune solely based on max "sum of edge weights" (in which ceg=2 > 1 for others). This tradeoff is decided by the function g, and to a limited extent f as well, and instead g(q,2) and g(q',1) are compared, where q=2f(C)+f(K) and q'=3f(K) in this example.

p.s. This is an idea from late 2015, but wanted to get it up here since it is only becoming less likely that I will implement it as my studies move far away from these kind of problems, and @cfinsness told me sup is strugglin lately. Hopefully someone out there is interested in graph algorithms and has free time next hackathon, @dblock any suggestions? @mzikherman , @wrgoldstein ?