/CoLL

A continual learning framework for natural language processing.

Primary LanguagePythonOtherNOASSERTION

CoLL - Continual Language Learning

PyPI Documentation

A collection of extensions and data-loaders for continual language learning in PyTorch. CoLL contains popular continual language learning benchmarks, similarly compatible with both Avalanche and Sequoia.

Features

  • Application: Unified interfaces of typical continual-language-learning applications, including text classification, text generation, and sequence labelling, which enables easy benchmarking on multiple problems and reproducibility.
  • Learning Paradigm: Simulating the learning paradigm of full-supervision, semi-supervision, un-supervision or self-supervision.
  • Continual Setting: Built-in typical continual learning settings, e.g., instance-incremental learning, class-incremental learning, task-incremental learning, domain-incremental learning.
  • Backbone Model: Supporting various pretrained language models (HuggingFace/Transformers) and the extension modules (Adapters) for continual learning.
  • Metrics: Unified metrics for the fair and systematical comparison.
  • Baselines: Built-in implementations and helper functions for some popular methods, with default arguments from the literature.

Note: This is still very much a Work-In-Progress! Please feel free to share your wisdom.

Installation

You can install CoLL either using Python's package manager pip, or from source. To avoid any conflict with your existing Python setup, it is suggested to work in a virtual environment with virtualenv. To install virtualenv:

pip install --upgrade virtualenv
virtualenv venv
source venv/bin/activate

Requirements

  • Python 3.6 or above
  • PyTorch 1.4 or above

Using pip

pip install coll

From Source

git clone https://github.com/wutong8023/CoLL.git
cd CoLL
python setup.py install

Example

from coll.backbone import PLMClassifier
from coll.environment import Environment, CreEnv
from coll.environment.dataset import FewRel # 
from coll.environment.paradigm import SemiSuper # LimitedSuper, FullSuper, UnSuper, InteractSuper
from coll.environment.setting import TaskIL # ClassIL, InstanceIL, DomainIL
from coll.method import ER # EWC, LwF, LAMOL, MbPA++, etc.
from coll.utils.metrics import acc_a # average accuracy
from coll.utils.buffer_memory import ReservoirMemory
from coll.utils.train import Trainer
from coll.utils.eval import Evaluater

# 1. define continual learning environment
# customize continual learning environment
data = FewRel()
paradigm = SemiSuper()
setting = TaskIL(split_by="clustering")
cl_env = Environment(data, paradigm, setting)

# or load predefined environment
cl_env = CreEnv()

# 2 define backbone model
backbone = PLMClassifier()

# 3 define continual learning strategy
memory = ReservoirMemory(size=500, extend=False)
cl_method = ER(memory)

# 4 train
Trainer.train(backbone, cl_env, cl_method)

# 5 evaluation
results = Evaluater.evaluate(backbone, cl_env, cl_method, acc_a)

print(results.summary())

Framework Structure

coll
├── coll
│   ├── evaluation
│   └── utils
└── examples