/Table-of-Contents

Usage, novelty, and metadata of code repositories.

Table of Contents

Introduction to DM2-ND repositories (Data Mining towards Decision Making Laboratory [page] at the University of Notre Dame).

Director: Dr. Meng Jiang [page]

Chapter 1. Graph Machine Learning

  • 1.1. Learning Graph Dynamics
    • 1.1.1. CalendarGNN [lab repo]
      • Paper: Calendar Graph Neural Networks for Modeling Time Structures in Spatiotemporal User Behaviors (KDD 2020) [download]
      • Leading author: Daheng Wang (dwang8@nd.edu)
      • Usage: Given user behavior data (e.g., reading behaviors on News App), can we learn user representations that preserve spatiotemporal patterns of a variety of periodicity (e.g., hourly, weekly, and weekday patterns)? The representations can be used for demongraphic prediction (sex, age, etc.) and recommendation.
      • Novelty: It leverages the Calendar System as a knowledge graph to enhance graph neural networks on temporal graph data of user behaviors.
    • 1.1.2. CoEvoGNN [lab repo]
      • Paper: Learning Attribute-Structure Co-Evolutions in Dynamic Graphs (KDD-DLG 2020 Best Paper) [download]
      • Leading author: Daheng Wang (dwang8@nd.edu)
      • Usage: It learns node embeddings for forecasting change of node attributes and birth and death of links over time.
      • Novelty: It is a novel framework for modeling dynamic attributed graph sequence. It preserves the impact of earlier graphs on the current graph by embedding generation through the sequence. It has a temporal self-attention mechanism to model long-range dependencies in the evolutionary process. Moreover, it optimizes model parameters jointly on two dynamic tasks, attribute inference and link prediction over time. So the model can capture the coevolutionary patterns of attribute change and link formation.
    • 1.1.3. CatchTartan [lab repo]
      • Paper: CatchTartan: Representing and Summarizing Dynamic Multicontextual Behaviors (KDD 2016) [download]
      • Leading author: Meng Jiang (mjiang2@nd.edu)
  • 1.2. Learning Node Complementarity
    • 1.2.1. LearnSuc [lab repo]
      • Paper: Multi-Type Itemset Embedding for Learning Behavior Success (KDD 2018) [download]
      • Leading author: Daheng Wang (dwang8@nd.edu)
      • Usage: Can we learn the representations of elements in a set by preserving their complementarity?
      • Novelty: It proposes a novel representation learning method for sets of items of different types.
    • 1.2.2. TUBE [lab repo]
      • Paper: TUBE: Embedding Behavior Outcomes for Predicting Success (KDD 2019) [download]
      • Leading author: Daheng Wang (dwang8@nd.edu)
      • Usage: Can we learn the complementarity among researchers for effective teaming?
      • Novelty: It proposes a novel measurement of complementarity to replace similarity in representation learning frameworks.
  • 1.3. Graph Anomaly Detection
    • 1.3.1. GAL [lab repo] [src repo]
      • Paper: Error-Bounded Graph Anomaly Loss for GNNs (CIKM 2020) [download]
      • Leading author: Tong Zhao (tzhao2@nd.edu)
      • Usage: Can we learn node representations in an unsupervised way on bipartite graphs for the tasks of graph anomaly detection?
      • Novelty: This model uses unsupervised graph anomaly detection algorithms to produce pseudo labels to supervise the training of graph neural network frameworks.
    • 1.3.2. AOO [lab repo]
      • Paper: Actionable Objective Optimization for Suspicious Behavior Detection on Large Bipartite Graphs (BigData 2018) [download]
      • Leading author: Tong Zhao (tzhao2@nd.edu)
      • Usage: Given "who-reviews-what" data on e-commercial platforms, can we deliver an automated solution to accurately suspend fake reviewers and/or bully buyers?
      • Novelty: This model learns to measure the suspiciousness of nodes by simultaneously minimizing the loss (e.g., false reviews) and maximizing the profit (e.g., sales).
    • 1.3.3. LockInfer [lab repo]
      • Paper: Inferring Strange Behavior from Connectivity Pattern in Social Networks (PAKDD 2014) [download]
      • Paper: Inferring Lockstep Behavior from Connectivity Pattern in Large Graphs (KAIS 2016) [download]
      • Leading author: Meng Jiang (mjiang2@nd.edu)
    • 1.3.4. CatchSync [lab repo]
      • Paper: CatchSync: Catching Synchronized Behavior in Large Directed Graphs (KDD 2014) [download]
      • Paper: Catching Synchronized Behaviors in Large Networks: A Graph Mining Approach (TKDD 2016) [download]
      • Leading author: Meng Jiang (mjiang2@nd.edu)
    • 1.3.5. CrossSpot [lab repo]
      • Paper: A General Suspiciousness Metric for Dense Blocks in Multimodal Data (ICDM 2015) [download]
      • Paper: Spotting Suspicious Behaviors in Multimodal Data: A General Metric and Algorithms (TKDE 2016) [download]
      • Leading author: Meng Jiang (mjiang2@nd.edu)
  • 1.4. Graph Data Augmentation
    • 1.4.1. GAug [lab repo] [src repo]
      • Paper: Data Augmentation for Graph Neural Networks (AAAI 2021) [download]
      • Leading author: Tong Zhao (tzhao2@nd.edu)
      • Usage: How can we perform graph data augmentation for graph neural networks (GNNs) in the context of improving semi-supervised node-classification?
      • Novelty: It shows that neural edge predictors can effectively encode class-homophilic structure to promote intra-class edges and demote inter-class edges in given graph structure, and introduces a novel framework that leverages these insights to improve performance in GNN-based node classification via edge manipulation.

Chapter 2. Information Extraction (IE)

  • 2.1. Scientific Named Entity Recognition (SciNER)
    • 2.1.1. TriTrain [lab repo] [src repo]
      • Paper: Tri-Train: Automatic Pre-Fine Tuning between Pre-Training and Fine-Tuning for SciNER (EMNLP 2020) [download]
      • Leading author: Qingkai Zeng (qzeng@nd.edu)
      • Usage: Given a sentence "FDA has approved remdesivir for the treatment of COVID-19 in certain situations", can we detect "FDA: Organization", "remdesivir: Drug", and "COVID-19: disease"? This framework performs NER in scientific domains.
      • Novelty: It introduces a "pre-fine tuning" step between pre-training and fine-tuning to fast and effectively adapt NER models in new scientific domains.
  • 2.2. Scientific Knowledge Graph Construction (SciKG)
    • 2.2.1. SciKG [lab repo]
      • Paper: The Role of "Condition": A Novel Scientific Knowledge Graph Representation and Construction Model (KDD 2019) [download]
      • Leading author: Tianwen Jiang (tjiang2@nd.edu)
      • Usage: It proposes a novel representation of SciKG and delivers a model to build the SciKG.
      • Novelty: Conditions play an essential role in scientific observations, hypotheses, and statements. Unfortunately, existing scientific knowledge graphs (SciKGs) represent factual knowledge as a flat relational network of concepts, as same as the KGs in general domain, without considering the conditions of the facts being valid, which loses important contexts for inference and exploration. This work considers the conditions of factual claims.
    • 2.2.1. MIMO_CFE [lab repo] [src repo]
      • Paper: Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text (EMNLP 2019) [download]
      • Leading author: Tianwen Jiang (tjiang2@nd.edu)
  • 2.3. IE from Tabular Data
    • 2.3.1. Tablepedia [lab repo]
      • Paper: Experimental Evidence Extraction in Data Science with Hybrid Table Features and Ensemble Learning (WWW 2020) [download]
      • Leading author: Wenhao Yu (wyu1@nd.edu)
      • Usage: Can we discover knowledge from tables of experimental results in data science literature? It offers a new data set and a tool.
      • Novelty: It extracts experimental evidence from data science papers in PDF format and builds up the first experimental database for related research.
    • 2.3.2. TCN
      • Paper: TCN: Table Convolutional Network for Web Table Interpretation (WWW 2021) [download]
      • Leading author: Daheng Wang (dwang8@nd.edu)
      • Usage: Can we extract information from semi-structured webpage provides valuable long-tailed facts for augmenting knowledge graph?
      • Novelty: It is a novel relational table representation learning approach considering both the intra- and inter-table contextual information.
  • 2.4. Temporal Fact IE
    • 2.4.1. MetaPAD [lab repo]
      • Paper: Meta Pattern-driven Attribute Discovery from Massive Text Corpora (KDD 2017) [download]
      • Leading author: Meng Jiang (mjiang2@nd.edu)
    • 2.4.2. TFWIN [lab repo]
      • Paper: A Novel Unsupervised Approach for Precise Temporal Slot Filling from Incomplete and Noisy Temporal Contexts (WWW 2019) [download]
      • Leading author: Xueying Wang (xwang41@nd.edu)
      • Usage: Can AI read news articles and then fill in temporal slots such as (vicente_fox, per:is_president_of, __, [ __ , __ ]) as (entity, attribute, value, [beginTime, endTime])? The first slot is the value of a specific attribute (e.g., country's president) for an entity (e.g., the person "vicente_fox"). Here the value should be a country's name. The second and third slots are the beginning and ending time points of the attribute value being valid.
      • Novelty: It is an unsupervised approach of two modules that mutually enhance each other: one is a reliability estimator on fact extractors conditionally to the temporal contexts; the other is a fact trustworthiness estimator based on the extractor’s reliability. The iterative learning process reduces the noise of the extractions.
  • 2.5. Intent Detection
    • 2.5.1. ReferInt [lab repo]
      • Paper: Identifying Referential Intention with Heterogeneous Contexts (WWW 2020) [download]
      • Leading author: Wenhao Yu (wyu1@nd.edu)
      • Usage: Citing, quoting, and forwarding & commenting behaviors are widely seen in academia, news media, and social media. This work identifies the referential intention which motivates the action of using the referred (e.g., cited, quoted, and retweeted) source and content to support their claims.
      • Novelty: It is a novel neural framework with Interactive Hierarchical Attention (IHA) to identify the intention of referential behavior by properly aggregating the heterogeneous contexts, including referred content (e.g., a cited paper), local context (e.g., the sentence citing the paper), neighboring context (e.g., the former and latter sentences), and network context (e.g., the academic network of authors, affiliations, and keywords).

Chapter 3. Natural Language Generation (NLG)

  • 3.1. Methodologies
    • 3.1.1. KENLG-Reading [lab repo] [src repo]
      • Paper: A Survey of Knowledge-enhanced Text Generation (arXiv) [download]
      • Leading author: Wenhao Yu (wyu1@nd.edu)
      • Usage: It offers a long reading list to complement with the survey paper. Related literature in 2020-2021 has been added and discussed.
      • Novelty: Enhancing NLG with knowledge is a very popular research direction. However, there was not a comprehensive survey.
  • 3.2. Question Answering
    • 3.2.1. CrossVAE [lab repo]
      • Paper: Crossing Variational Autoencoders for Answer Retrieval (ACL 2020) [download]
      • Leading author: Wenhao Yu (wyu1@nd.edu)
      • Usage: Given a question and a set of answer candidates, can we accurately retrieve the best answer?
      • Novelty: This model learns across two Variational Auto-Encoders that (a) generates answer from question and (b) generates question from answer, in order to better understand question/answer semantics.
    • 3.2.2. TransTQA [lab repo]
      • Paper: A Technical Question Answering System with Transfer Learning (EMNLP 2020) [download]
      • Leading author: Wenhao Yu (wyu1@nd.edu)
      • Usage: It is a novel system that offers automatic responses by retrieving proper answers based on correctly answered similar questions in the past.
      • Novelty: It is built upon a siamese ALBERT network, which enables to respond quickly and accurately. It adopts a standard deep transfer learning strategy to improve its capability of supporting multiple technical domains.