/text

Data loaders and abstractions for text and NLP

Primary LanguagePython

This is a temp repo for hack week: Data APIs for NLP

Get started

  • install HuggingFace datasets. We copied it here to jump start. Eventually, we will build our own.

pip install -e stl_text/dataframes/datasets

  • install PyTorch and torchtext nightlies as some of the tasks depend on the prototype work in torchtext library.

to install cpu version on Linux:

pip install --pre torch torchtext -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html;

to install cuda 10.1 version on Linux:

pip install --pre torch torchtext -f https://download.pytorch.org/whl/nightly/cu101/torch_nightly.html;

More detailed instructions are available here.

  • install this package

pip install -e .

  • run an example

python examples/hf_dataset_quick_tour.py