Code for using the CoCon data set — data set capturing the combined use of research artifacts, contextualized through academic publication text.
- Data on Zenodo
- Code (
contexthraph/
) - Paper (author copy)
The methods described below can be imported from contextgraph.util.graph
.
You’ll also want to import networkx as nx
in your code.
Returns the full graph in which all nodes and edges types.
parameter | values | default | explanation |
---|---|---|---|
shallow | bool | False | True: all node and edge features except for type will be discarded. |
| | | False: nodes and edges have features according to the data available on Papers with Code. |
directed | bool | True | True: Return the graph as an nx.DiGraph (for an overview of directions of edges see _load_edge_tuples in contextgraph/util/graph.py ). |
| | | False: Return the graph as an nx.Graph. |
with_contexts | bool | False | True: Also load “context” nodes (each appearance of a used entity in a paper results in a context node connected as entity--part_of->context--part_of->paper). Be aware that this will result in a lot of additional nodes. |
| | | False: Don’t load context nodes. |
Load graph only containing the entity nodes connected by edges which represent their co-occurrence papers based on a scheme setting described below. (NOTE: to access edge attributes later, G.edges(data=True) has to be used!)
parameter | values | default | explanation |
---|---|---|---|
scheme | {'weight', 'sequence'} | 'sequence' | 'sequence': edges have two properties (1) linker_sequence (a list of the co-occurrence paper IDs) and (2) interaction_sequence (a list of integers to be understood as “time steps”. Each integer value is the <month> in which a co-occurrence paper is published. The month in which the very first co-occurrence paper within the returned graph is published in month 0). |
| | | 'weight': edges have a weight attribute that denotes the number of co-occurrence papers that exist between the two entities. |
graph schema
- node features
- tasks
- id (str)
- type (str)
- name (str)
- description (str)
- categories (list)
- method
- url (str)
- name (str)
- full_name (str)
- description (str)
- paper (dict)
- introduced_year (int)
- source_url (str)
- source_title (str)
- code_snippet_url (str)
- num_papers (int)
- id (str)
- type (str)
- model
- id (str)
- type (str)
- name (str)
- using_paper_titles (list)
- evaluations (list)
- dataset
- url (str)
- name (str)
- full_name (str)
- homepage (str)
- description (str)
- paper (dict)
- introduced_date (str)
- warning (NoneType)
- modalities (list)
- languages (list)
- num_papers (int)
- data_loaders (list)
- id (str)
- type (str)
- year (int)
- month (int)
- day (int)
- variant_surface_forms (list)
- tasks
- edge features
The methods described below can be imported from contextgraph.util.torch
.
Load graph only containing the entity nodes connected by edges which represent their co-occurrence papers. Edge weights represent the number of co-occurrence papers.
graph schema
- node features
- id (ordinal) (0..<num_nodes>)
- type (ordinal) (dataset: 0, method: 1, model: 2, task: 3)
- description (transformer based embedding)
- edge features
- “weight” (=number of combined use papers, see load_entity_combi_graph() scheme parameter)
- Set paths in
contexthraph/config.py
- Run
$ python3 preprocess.py
- Run
$ python3 precomp_descr_embs.py
@inproceedings{Saier2023cocon,
author = {Saier, Tarek and Dong, Youxiang and F\"{a}rber, Michael},
title = {{CoCon: A Data Set on Combined Contextualized Research Artifact Use}},
booktitle = {2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL)},
year = {2023},
pages = {47--50},
month = jun,
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
doi = {10.1109/JCDL57899.2023.00016}
}