/Interface

Primary LanguagePython

Graph-centric Interfaces

Conduct transformations on bipartite graphs for subgraph sampling.

Graph transformations : bipartite graph -> bipartite graph

  • Easy to understand without deep understanding of the underlying system implementation
  • Naturally support for node-wise, layer-wise sampling and graph-wise sampling
  • Graph sampled can be directly used for the downstream tasks

Structures

Bipartite Graph

Each graph is a bipartite graph;

Each g has SrcNodes, DstNodes, Edges;

Store data with NodeData, EdgeData dictionaries. For heterogeneous graph, we use SrcNodeData and DstNodeData instead of NodeData;

  • Transformations (Bipartite Graph -> Bipartite Graph)

    • NodeFilter(NodeSetName, FilterFunc)
      • Filter nodes in NodeSetName, which can be SrcNode, DstNode or AllNode
    • EdgeFilter(FilterFunc)
      • Filter edges by function return value [bool]
    • NeighborFilter(NodeSetName, FilterFunc)
      • Filter neighbors of each node in the NodeSetName set, which can be SrcNode or DstNode
  • Properties (Bipartite Graph -> Nodes or (Nodes, Nodes))

    • Nodes SrcNodes(Unique : bool);

    • Nodes DstNodes(Unique : bool)

    • Nodes AllNodes()

      • For heterogeneous graph, it will raise an error. [Undefined]
    • Tuple(Nodes, Nodes) Edges()

      • Return (g.SrcNodes(False), g.DstNodes(False))
    # case study for Bipartite Graph Properties
    
    g = CreateGraph('coo', [[1,1,3], [1,4,4]], FirstIsSrc=True)
    # g.SrcNodes(unique=True) = [1,3]
    # g.SrcNodes(unique=False) = [1,1,3]
    # g.DstNodes(unique=True) = [1,4]
    # g.DstNodes(uniuqe=False) = [1,4,4]
    # g.AllNodes() = [1,4,3]
    # g.Edges() = ([1,1,3], [1,4,4])
  • Other methods

    • g.num_nodes()
    • g.num_edges()
    • g.degrees(NodeSetName)
      • NodeSetName is SrcNode or DstNode or AllNodes
      • AllNodes for heterogeneous graph is undefined

Set

Append writes only. It should be only used in Graph Transformations.

Method

  • Void insert(Nodes node)
  • Nodes iterate()
# case study for Set
ret = Set()
ret.insert([1,2,1]) # ret = [1,2]
ret.insert([4,2,3]) # ret = [1,2,4,3]
# ret.iterate() = [1,2,4,3]

HashTable

All keys have the same type and all values have the same type.

Method:

  • Void update(key, value, op)
    • op is in ['add', 'assign', 'min', 'max']
  • Void delete(key)
  • Tuple(Keys, Values) iterate()
# case study for HashTable
Dir = HashTable()
Dir.update([1,2,3], [4,5,4], op='add') 
# key = [1,2,3], value = [4,5,4]; for key not in HashTable, we just do 'assign'
Dir.update([1,2,3], [1,2,3], op='add') 
# key = [1,2,3], value = [5,7,7]
Dir.update([1,1,1], [5,4,2], op='min') 
# key = [1,2,3], value = [2,7,7]
Dir.delete([2,3]) 
# key = [1], value = [2]
# Dir.iterate() = ([1], [2])

Transformations

NodeFilter(NodeSetName, FilterFunc)

  • Filter nodes in NodeSetName, which can be SrcNode, DstNode or AllNode

  • FilterFunc is a built in function, which can be Fn.in(), Fn.notin(), Fn.random()

    Fn.in(Nodes node_set) 
    # it NodeSetName[i] is in node_set, ret[i] will be True, otherwise False.
    Fn.notin(Nodes node_set)
    # The opposite effect of Fn.in
    Fn.random(num_pick, bias=None, replace=False)
    # Do random sampling on NodeSetName
    Fn.flagged(Bool[] mask)
    # Filter NodeSetName[i] according to mask[i]

EdgeFilter(FilterFunc)

  • Filter edges by function return value [bool].

  • FilterFunc is a function that returns Bool[], and the return value should be the same length as g.num_edges().

  • Users can define their own functions.

  • We also provide some built in functions: Fn.Edge.in, Fn.Edge.notin, Fn.Edge.flagged.

    Fn.Edge.in(Source, Target)
    # if Source[i] is in Target, ret[i] will be True, otherwise it will be False.
    ## Case 1
    ## Fn.Edge.in([1,2,3,4], [2,4,6]), it will return [F, T, F, T]
    ## Case 2
    ## Fn.Edge.in(([1,2,4,4], [1,2,2,4]), ([2,4], [2,4])), it will return [F, T, F, T]
    ## (1,1) is not in Target, (2, 2) is in Target, (4, 2) is not in Target, (4, 4) is in Target.
    
    Fn.Edge.notin(Source, Target)
    # The opposite effect of Fn.Edge.in
    
    Fn.Edge.flagged(Bool[] mask) 
    # len(mask) should be equal to g.num_edges(). 
    # If mask[i] is True, edge[i] will be kept, otherwise edge[i] will be deleted.
    
    # Case
    ## use built in function
    g.EdgeFilter(Fn.Edge.notin(g.EdgeData['eids'], reverse_ids)
    ## use udf function
    udf = lambda src, dst, T : T[src] < T[dst]
    T = g.NodeData['T']
    g.EdgeFilter(udf(g.SrcNodes(uniuqe=False), g.DstNodes(uniuqe=False), T))

NeighborFilter(NodeSetName, FilterFunc)

  • Filter neighbors of each node in the NodeSetName set, which can be SrcNode or DstNode

  • FilterFunc is a built in funciton, which can be Fn.random, Fn.unique

    Fn.random(num_picks, bias=None, replace=False, keep_dim=False) 
    # to sample the neighbors of NodeSetName
    # If keep_dim = True, it will keep the shape of g and replace the invalid node with -1
    ## Case
    g = CreateGraph('coo', [[1,1,1,1], [1,2,3,4]])
    subg = g.NeighFilter('SrcNode', Fn.random(2, keep_dim=True))
    ## subg is [[1,1,1,1], [1,2,-1,-1]
    subg = g.NeighFilter('SrcNode', Fn.random(2, keep_dim=False))
    ## subg is [[1,1], [1,2]]
    
    Fn.unique(keep_dim=False) 
    # to dedup the neighbors of NodeSetName. Try to creata a simple graph of NodeSetName.

Other Functions

  • CreateGraph

    CreateGraph(format : str, Tensor[], FirstIsSrc=True, heterGraph=False) -> Graph
    # format is in ['coo', 'csr', 'csc']
  • SpMM

    SpMM(NodeSetName, root_data : Tensor, neigh_data : Tensor, udf) -> Tensor
    # has the same semantics as SpMM in DGL, but only for scalars. root_data, neigh_data are 1-D.
  • SDDMM

    SDDMM(NodeSetName, root_data : Tesnor, neigh_data : Tensor, edge_data : Tensor, udf) -> Tensor
    # has the same semantics as SDDMM in DGL, but only for scalars. root_data, neigh_data, edge_data are 1-D.
  • Any operations supported by PyTorch.

Common Opeartions

# Extract a subgraph with a subset of seed nodes
# Return a subgrah which contains only the seeds and their neighbors.
g.NodeFilter('DstNode', Fn.in(seeds))
# Sample in-neighborhood of a subgraph
subg.NeighborFilter('DstNode', Fn.random(fanout))
# Biased Sampling
Subg.NeighborFilter('DstNode', Fn.random(25, bias=subg.DstData('w'))
# Layerwise Sampling
Subg.NodeFilter('SrcNode', Fn.random(fanout))
# in_subgraph
Subg.NodeFilter('AllNode', Fn.in(nodes))                  
# Remove edges
## Case 1
g.EdgeFilter(Fn.Edge.notin(g.EdgeData['eid'], reverse_ids))
## Case 2
g.EdgeFilter(Fn.Edge.notin(g.Edges(), neg.Edges()))                    
## Case 3
Udf = lambda src, dst, T : T[src] > T[dst]
g.EdgeFilter(Udf(g.SrcNodes(unique=False), g.DstNodes(unique=False), g.NodeData['T'])
# Get graph properties
g.SrcNodes(unique=False)
g.DstNodes(unique=True)
g.Edges()

Known Issues

  • SEAL could not be expressed efficiently