Issues
- 0
Is this repo still actively maintained?
#49 opened by d-kleine - 3
include size of processed corpus in README
#43 opened by erikfredner - 2
Not windows-friendly things
#37 opened by fontclos - 0
skipped due to duplication
#48 opened by Felix-liu0989 - 3
no data stored in bookshelves_ebooks_dict.pkl and bookshelves_categories_dict.pkl after successful running
#45 opened by kaapivalli - 0
Storing raw data in a compressed format
#44 opened by PadLex - 1
"Connection refused"
#40 opened by danielplatt - 2
Bookshelves
#38 opened by nofreewill42 - 6
Getting info about the data before download
#29 opened by edilsonacjr - 3
- 13
rsync command fails on Windows 10
#30 opened by andreluizgit - 1
Allow for retrieving epubs files?
#31 opened by hneutr - 3
File not found on Windows 10
#34 opened by luigiusai - 5
get_data.py fails: ReadError
#32 opened by maxbry - 1
- 0
- 2
"Copyright Renewal" text
#24 opened by martingerlach - 1
parse_bookshelves() fails due to encoding issue
#25 opened by fontclos - 0
Add a LICENSE
#22 opened by fontclos - 2
Simplify requirements files
#19 opened by fontclos - 5
- 1
remove notebooks and all jupyter stuff
#20 opened by fontclos - 3
- 1
Create lists of counts
#10 opened by fontclos - 1
- 5
ValueError: The specified mirror directory does not exist when running 'python get_data.py'
#1 opened by martingerlach - 1
Missing newline at the end of counts files
#2 opened by fontclos - 1
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 12, column 65
#3 opened by fontclos - 4
- 3
- 1
Duplicates detection
#11 opened by fontclos - 2
NLTK tokenizer missing on fresh run
#5 opened by fontclos