soundcloud/spark-pagerank

Implement missing graph modification functions

joshdevins opened this issue · 1 comments

For every requirement of the graph structure, there should be a check/validation function as well as a function to correct any structure that needs it. Examples:

  /**
   * Removes any edges that are self-referencing the same vertex. That is, any
   * edges where the source and destination are the same. Any resulting vertices
   * that have no edges (in or out) will remain in the graph.
   */
  def removeSelfReferences(graph: Graph): Graph

  /**
   * Removes any vertices that have no in or out edges.
   */
  def removeDisconnectedVertices(graph: Graph): Graph

Just adding remove self-references. The other we have never typically used and is not necessary for anything in PageRank. There can be disconnected vertices in PageRank and they will just get teleport probability only.