cbaziotis/ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
PythonMIT
Issues
- 0
Please add a LICENSE to this repo
#21 opened by napsternxg - 0
- 1
tokenizing '20th' to '2','0','th'
#30 opened by KavishBhatia - 3
she's --> she ' s
#8 opened by CarloSegat - 17
Word Statistics File not Found. | Receiving 404 error while dowloading the file.
#28 opened by imVParashar - 3
- 2
how to get the word statistics?
#31 opened by UGUESS-lzx - 9
Updation of url : https://www.dropbox.com/s/a84otqrg6u1c5je/stats.zip?dl=1 required
#11 opened by devamanyu - 0
How can the text_processor be parelize?
#27 opened by danielafe7-usp - 1
"maximum recursion depth exceeded" Error
#26 opened by mjag7682 - 1
Can Ekphrasis be used in other languages?
#25 opened by shuningge - 0
- 0
spelling correction mostly is not working
#20 opened by stas00 - 5
- 0
Segmentation: Preserve case?
#19 opened by davidbernat - 0
- 2
- 0
The TextPreProcessor class only supports segmenting text with hastags. Required support for normal text segmenter.
#15 opened by aman5319 - 1
Log messages print to stdout
#2 opened by ckingdev - 0
Add tests for regexes
#1 opened by cbaziotis - 2
- 1
Memory usage
#9 opened by xro7 - 0
Failed during generate_stats.py
#12 opened by JingLiJJ - 0
Spell corrector in other languages
#10 opened by al-jwarizmi - 2
- 2
Installing from pypi doesn't pull in deps
#5 opened by ckingdev - 1
Warning regarding using TextPreProcessor as a preprocessing for torchtext.data.Field()
#7 opened by davidalbertonogueira - 0
extracting url
#6 opened by kishore0905