Contextgraph

Code for using the CoCon data set — data set capturing the combined use of research artifacts, contextualized through academic publication text.

Usage

NetworkX

The methods described below can be imported from contextgraph.util.graph.
You’ll also want to import networkx as nx in your code.

load_full_graph()

Returns the full graph in which all nodes and edges types.

parameter values default explanation
shallow bool False True: all node and edge features except for type will be discarded.
False: nodes and edges have features according to the data available on Papers with Code.
directed bool True True: Return the graph as an nx.DiGraph (for an overview of directions of edges see _load_edge_tuples in contextgraph/util/graph.py).
False: Return the graph as an nx.Graph.
with_contexts bool False True: Also load “context” nodes (each appearance of a used entity in a paper results in a context node connected as entity--part_of->context--part_of->paper). Be aware that this will result in a lot of additional nodes.
False: Don’t load context nodes.
load_entity_combi_graph()

Load graph only containing the entity nodes connected by edges which represent their co-occurrence papers based on a scheme setting described below. (NOTE: to access edge attributes later, G.edges(data=True) has to be used!)

parameter values default explanation
scheme {'weight', 'sequence'} 'sequence' 'sequence': edges have two properties (1) linker_sequence (a list of the co-occurrence paper IDs) and (2) interaction_sequence (a list of integers to be understood as “time steps”. Each integer value is the <month> in which a co-occurrence paper is published. The month in which the very first co-occurrence paper within the returned graph is published in month 0).
'weight': edges have a weight attribute that denotes the number of co-occurrence papers that exist between the two entities.
graph schema
  • node features
    • tasks
      • id (str)
      • type (str)
      • name (str)
      • description (str)
      • categories (list)
    • method
      • url (str)
      • name (str)
      • full_name (str)
      • description (str)
      • paper (dict)
      • introduced_year (int)
      • source_url (str)
      • source_title (str)
      • code_snippet_url (str)
      • num_papers (int)
      • id (str)
      • type (str)
    • model
      • id (str)
      • type (str)
      • name (str)
      • using_paper_titles (list)
      • evaluations (list)
    • dataset
      • url (str)
      • name (str)
      • full_name (str)
      • homepage (str)
      • description (str)
      • paper (dict)
      • introduced_date (str)
      • warning (NoneType)
      • modalities (list)
      • languages (list)
      • num_papers (int)
      • data_loaders (list)
      • id (str)
      • type (str)
      • year (int)
      • month (int)
      • day (int)
      • variant_surface_forms (list)
  • edge features
    • scheme = weight
      • “weight”
    • scheme = sequence
      • “interaction_sequence”
      • “linker_sequence”

PyTorch Geometric

The methods described below can be imported from contextgraph.util.torch.

load_entity_combi_graph()

Load graph only containing the entity nodes connected by edges which represent their co-occurrence papers. Edge weights represent the number of co-occurrence papers.

graph schema
  • node features
    • id (ordinal) (0..<num_nodes>)
    • type (ordinal) (dataset: 0, method: 1, model: 2, task: 3)
    • description (transformer based embedding)
  • edge features

Preprocessing

  • Set paths in contexthraph/config.py
  • Run $ python3 preprocess.py
  • Run $ python3 precomp_descr_embs.py

Cite as

@inproceedings{Saier2023cocon,
  author    = {Saier, Tarek and Dong, Youxiang and F\"{a}rber, Michael},
  title     = {{CoCon: A Data Set on Combined Contextualized Research Artifact Use}},
  booktitle = {2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL)},
  year      = {2023},
  pages     = {47--50},
  month     = jun,
  publisher = {IEEE Computer Society},
  address   = {Los Alamitos, CA, USA},
  doi       = {10.1109/JCDL57899.2023.00016}
}