
Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer

TianlinZhang668 opened this issue · 8 comments

i run makedatafiles.py. but it has an error:
Preparing to tokenize /home/ztl/Downloads/cnn_stories/cnn/stories to cnn_stories_tokenized...
Making list of files to tokenize...
Tokenizing 92579 files in /home/ztl/Downloads/cnn_stories/cnn/stories and saving in cnn_stories_tokenized...
Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer
Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.process.PTBTokenizer
Stanford CoreNLP Tokenizer has finished.
Traceback (most recent call last):

However i can run echo "Please tokenize this text." | java edu.stanford.nlp.process.PTBTokenizer in the root
i dont know how to deal with? thanks a lot

i run the corenlp-3.9.2.jar

You need stanford-corenlp-3.7.0.jar. See this: https://github.com/abisee/cnn-dailymail#2-download-stanford-corenlp
Please read the README.md file.

Successfully finished tokenizing /home/ztl/Downloads/cnn_stories/cnn/stories to cnn_stories_tokenized.

Making bin file for URLs listed in url_lists/all_test.txt...
Traceback (most recent call last):
File "make_datafiles.py", line 239, in
write_to_bin(all_test_urls, os.path.join(finished_files_dir, "test.bin"))
File "make_datafiles.py", line 154, in write_to_bin
url_hashes = get_url_hashes(url_list)
File "make_datafiles.py", line 106, in get_url_hashes
return [hashhex(url) for url in url_list]
File "make_datafiles.py", line 106, in
return [hashhex(url) for url in url_list]
File "make_datafiles.py", line 101, in hashhex
TypeError: Unicode-objects must be encoded before hashing

i have got the tokenized, but next ....

Try this: https://github.com/JafferWilson/Process-Data-of-CNN-DailyMail
Guess it will solve your tokenization and rest other issues.

if I have content of the article that isn't the same as structure of the CNN's article

@quanghuynguyen1902 Guess you already have opened a new issue #29
Lets go there. Please someone close this issue.

I am facing the same issue in here.

source ./.bash_profile