columbia-applied-data-science/rosetta

Document Dependency on NLTK

Opened this issue · 0 comments

The README file lists out some dependencies, but excludes NLTK. Without NLTK, I cannot import Rosetta, see below. Is there any way to load Rosetta without installing NLTK (as I really just wanted to look at the parallel API)? If not, it should be documented.

Thanks!

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-4-03c43361a895> in <module>()
----> 1 import rosetta.parallel

/usr/local/lib/python3.5/dist-packages/rosetta/__init__.py in <module>()
----> 1 from rosetta.text.api import *

/usr/local/lib/python3.5/dist-packages/rosetta/text/api.py in <module>()
----> 1 from rosetta.text.streamers import TextFileStreamer
      2 
      3 from rosetta.text.text_processors import \
      4     TokenizerBasic, MakeTokenizer, SFileFilter, VWFormatter
      5 

/usr/local/lib/python3.5/dist-packages/rosetta/text/streamers.py in <module>()
     15 from .. import common
     16 from ..common import lazyprop, smart_open, DocIDError
---> 17 from . import filefilter, text_processors
     18 
     19 

/usr/local/lib/python3.5/dist-packages/rosetta/text/text_processors.py in <module>()
     22 import math
     23 
---> 24 import nltk
     25 import numpy as np
     26 import pandas as pd

ImportError: No module named 'nltk'