/MatchZoo

MatchingZoom is a toolkit for text matching. It was developed with a focus on enabling fast experimentation.

Primary LanguagePython

MatchZoo


MatchingZoom is a toolkit for text matching. It was developed with a focus on facilitate the designing, comparing and sharing of deep text matching models.

Overview

The architecture of the MatchZoo toolit is depicited in Figure

图片名称
There are three major modules in the toolkit, namely data preparation, model construction, training and evaluation, respectively. These three modules are actually organized as a pipeline of data flow.

Data Preparation

The data preparation module aims to convert dataset of different text matching tasks into a unified format as the input of deep matching models. Users provide datasets which contains pairs of texts along with their labels, and the module produces the following files.

  • Word Dictionary: recordsthemappingfromeachwordto a unique identi er called wid. Words that are too frequent (e.g. stopwords), too rare or noisy (e.g. fax numbers) can be ltered out by prede ned rules.
  • Corpus File: records the mapping from each text to a unique identi er called tid, along with a sequence of word identi ers contained in that text. Note here each text is truncated or padded to a xed length customized by users.
  • Relation File: is used to store the relationship between two texts, each line containing a pair of tids and the cor- responding label.

Model Construction

In the model construction module, we employ Keras libarary to help users build the deep matching model layer by layer conveniently. e Keras libarary provides a set of common layers widely used in neural models, such as convolutional layer, pooling layer, dense layer and so on. To further facilitate the construction of deep text matching models, we extend the Keras libarary to provide some layer interfaces speci cally designed for text matching.

Moreover, the toolkit has implemented two schools of representative deep text matching models, namely representation-focused models and interactive-focused models[1].

Training and Evaluation

For learning the deep matching models, the toolkit provides a variety of objective functions for regression, classification and ranking. For example, the ranking-related objective functions include several well-known pointwise, pairwise and listwise losses. It is flexible for users to pick up di erent objective functions in the training phase for optimization. Once a model has been trained, the toolkit could be used to produce a matching score, predict a matching label, or rank target texts (e.g., a document) against an input text.

Models

  1. DRMM
  2. MatchPyramid
  3. ARC-I
  4. DSSM
  5. CDSSM

Performance

Usage

python main.py --phase train --model_file ./models/drmm.config

Environment

  • python2.7+
  • tensorflow 1.2
  • keras 2.05

Model Detail:

  1. DRMM

this model is an implementation of A Deep Relevance Matching Model for Ad-hoc Retrieval.

  • model file: models/drmm.py
  • config file: models/drmm.config
  1. MatchPyramid

this model is an implementation of Text Matching as Image Recognition

  • model file: models/matchpyramid.py
  • config file: models/matchpyramid.config
  1. ARC-I

this model is an implementation of Convolutional Neural Network Architectures for Matching Natural Language Sentences

  • model file: models/arci.py
  • model config: models/arci.config
  1. DSSM

this model is an implementation of Learning Deep Structured Semantic Models for Web Search using Clickthrough Data

  • model file: models/dssm.py
  • config file: models/dssm.config
  1. CDSSM

under development ....

  1. ARC-II

under development ....

  1. Match-SRNN

under development ....

Acknowledgements

The following people contributed to the development of the MatchZoo project:

  • Yixing Fan
    • Institute of Computing Technolgy, Chinese Academy of Sciences
    • Google Scholar
  • Liang Pang
    • Institute of Computing Technolgy, Chinese Academy of Sciences
    • Google Scholar
  • Jianpeng Hou
    • Institute of Computing Technolgy, Chinese Academy of Sciences
  • Jiafeng Guo
    • Institute of Computing Technolgy, Chinese Academy of Sciences
    • HomePage
  • Yanyan Lan
    • Institute of Computing Technolgy, Chinese Academy of Sciences
    • HomePage
  • Jun Xu
    • Institute of Computing Technolgy, Chinese Academy of Sciences
    • HomePage
  • Xueqi Cheng
    • Institute of Computing Technolgy, Chinese Academy of Sciences
    • HomePage