/seq2seq-dataloader

PyTorch DataLoader for seq2seq

Primary LanguagePythonMIT LicenseMIT

DataLoader for Seq2seq

Efficient data loader for text dataset using torch.utils.data.Dataset, collate_fn and torch.utils.data.DataLoader.


Prerequesites


Usage

1. Clone the repository

$ git clone https://github.com/yunjey/seq2seq-dataloader.git
$ cd seq2seq-dataloader

2. Download nltk tokenizer

$ pip install nltk
$ python
$ import nltk
$ nltk.download('punkt')

3. Build word2id dictionary

$ python build_vocab.py

4. Check DataLoader

For usage, please see example.ipynb.