PyTextGCN

A re-implementation of TextGCN by Yao et al.: Graph Convolutional Networks for Text Classification. This implementation uses Cython for the text-to-graph transformation, making it rather fast. Graphs and GCN are based on the pytorch-geometric library.

Requirements

This project was built with:

Python 3.8.5
Cython 0.29.21
CUDA 10.2 (optional for GPU support)
scikit-learn 0.23.2
pytorch 1.7.0
torch-geometric 1.6.3
gcc 9.3.0
nltk 3.5
scipy 1.5.2

At least the Text2Graph-module should work with other versions of these libraries, too.

Installation

From the project root, the cython compilation can be done with:

cd textgcn/lib/clib && python setup.py build_ext --inplace

Usage

To compute a graph from a list of strings (where each string contains the text of one document) called X, a list of labels called y and a list of test indices test_idx, simply run:

from textgcn import Text2GraphTransformer

t2g = Text2GraphTransformer()

graph = t2g.fit_transform(X, y, test_idx=test_idx)

The resulting object graph is a torch_geometric.data.Data object containing the resulting graph and can be processed by any torch-geometric-based network. For more information on parameters of the Text2GraphTransformer and the resulting Data-object, consult the documentation in the source files.

Documentation

Currently resides in the source files.

How to reproduce our experiments