Contextgraph

Code for using the CoCon data set — data set capturing the combined use of research artifacts, contextualized through academic publication text.

Data on Zenodo
Code (contexthraph/)
Paper (author copy)

Usage

NetworkX

The methods described below can be imported from contextgraph.util.graph.
You’ll also want to import networkx as nx in your code.

load_full_graph()

Returns the full graph in which all nodes and edges types.

parameter	values	default	explanation
shallow	bool	False	True: all node and edge features except for type will be discarded.
‌	‌	‌	False: nodes and edges have features according to the data available on Papers with Code.
directed	bool	True	True: Return the graph as an nx.DiGraph (for an overview of directions of edges see `_load_edge_tuples` in `contextgraph/util/graph.py`).
‌	‌	‌	False: Return the graph as an nx.Graph.
with_contexts	bool	False	True: Also load “context” nodes (each appearance of a used entity in a paper results in a context node connected as entity--part_of->context--part_of->paper). Be aware that this will result in a lot of additional nodes.
‌	‌	‌	False: Don’t load context nodes.

load_entity_combi_graph()

Load graph only containing the entity nodes connected by edges which represent their co-occurrence papers based on a scheme setting described below. (NOTE: to access edge attributes later, G.edges(data=True) has to be used!)

parameter	values	default	explanation
scheme	{'weight', 'sequence'}	'sequence'	'sequence': edges have two properties (1) `linker_sequence` (a list of the co-occurrence paper IDs) and (2) `interaction_sequence` (a list of integers to be understood as “time steps”. Each integer value is the `<month>` in which a co-occurrence paper is published. The month in which the very first co-occurrence paper within the returned graph is published in month 0).
‌	‌	‌	'weight': edges have a weight attribute that denotes the number of co-occurrence papers that exist between the two entities.

graph schema

node features
- tasks
  - id (str)
  - type (str)
  - name (str)
  - description (str)
  - categories (list)
- method
  - url (str)
  - name (str)
  - full_name (str)
  - description (str)
  - paper (dict)
  - introduced_year (int)
  - source_url (str)
  - source_title (str)
  - code_snippet_url (str)
  - num_papers (int)
  - id (str)
  - type (str)
- model
  - id (str)
  - type (str)
  - name (str)
  - using_paper_titles (list)
  - evaluations (list)
- dataset
  - url (str)
  - name (str)
  - full_name (str)
  - homepage (str)
  - description (str)
  - paper (dict)
  - introduced_date (str)
  - warning (NoneType)
  - modalities (list)
  - languages (list)
  - num_papers (int)
  - data_loaders (list)
  - id (str)
  - type (str)
  - year (int)
  - month (int)
  - day (int)
  - variant_surface_forms (list)
edge features
- scheme = weight
  - “weight”
- scheme = sequence
  - “interaction_sequence”
  - “linker_sequence”

PyTorch Geometric

The methods described below can be imported from contextgraph.util.torch.

load_entity_combi_graph()

Load graph only containing the entity nodes connected by edges which represent their co-occurrence papers. Edge weights represent the number of co-occurrence papers.

graph schema

node features
- id (ordinal) (0..<num_nodes>)
- type (ordinal) (dataset: 0, method: 1, model: 2, task: 3)
- description (transformer based embedding)
edge features
- “weight” (=number of combined use papers, see load_entity_combi_graph() scheme parameter)

Preprocessing

Set paths in contexthraph/config.py
Run $ python3 preprocess.py
Run $ python3 precomp_descr_embs.py

Cite as

@inproceedings{Saier2023cocon,
  author    = {Saier, Tarek and Dong, Youxiang and F\"{a}rber, Michael},
  title     = {{CoCon: A Data Set on Combined Contextualized Research Artifact Use}},
  booktitle = {2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL)},
  year      = {2023},
  pages     = {47--50},
  month     = jun,
  publisher = {IEEE Computer Society},
  address   = {Los Alamitos, CA, USA},
  doi       = {10.1109/JCDL57899.2023.00016}
}

IllDepence/contextgraph

Contextgraph

Usage

NetworkX

load_full_graph()

load_entity_combi_graph()

PyTorch Geometric

load_entity_combi_graph()

Preprocessing

Cite as