Issues
- 3
New Maintainer Welcome :-)
#86 opened by maelle - 0
implicit conversion of character input to UTF-8
#87 opened by ablaette - 1
Add strip_url option to tokenize_words()
#85 opened by fschaffner - 2
tokenize_tweets replacement
#84 opened by alanault - 14
Twitter tokenizing logic broken by upcoming ICU 72 breaking change ('@' no longer splits)
#82 opened by MichaelChirico - 3
Low-level parallelism with RcppParallel
#51 opened by lmullen - 0
keeping punctuation
#80 opened by Legallois - 1
- 2
Possible CRAN release
#81 opened by EmilHvitfeldt - 4
- 2
Inconsistent behavior of tokenize_tweets() when filtering stopwords with punctuation
#76 opened by syumet - 11
- 1
- 2
tokenize_tweets and single word strings
#70 opened by juliasilge - 14
Submit paper to JOSS
#39 opened by lmullen - 0
Update DESCRIPTION prior to release
#62 opened by lmullen - 12
Specify encoding in C++ code for skip_ngrams
#58 opened by patperry - 7
- 14
integration into quanteda as a core tokenizer
#25 opened by kbenoit - 3
- 0
Add a pkgdown website
#61 opened by lmullen - 0
Update README and vignettes for new release
#43 opened by lmullen - 3
Add function to chunk texts into smaller segments
#30 opened by lmullen - 0
Add Jockers stopwords
#53 opened by lmullen - 0
Strip punctuation option for tokenize_ngrams
#57 opened by alanault - 2
Tokenize sentences starting with a number
#59 opened by ekstroem - 4
Lower level C++ api with external pointers
#50 opened by dselivanov - 3
Installation Error
#55 opened by rlumor - 1
Error: could not find function "%>%"
#54 opened by fahadshery - 1
- 2
Punctuation options
#48 opened by lmullen - 2
Way of committing to repo
#45 opened by dselivanov - 4
can we use alternative lexicons?
#47 opened by randomgambit - 0
Deprecate tokenize_regex()
#42 opened by lmullen - 1
Word counting
#36 opened by lmullen - 7
- 47
Incorrect skipgrams
#24 opened by koheiw - 0
Add keyboard interrupts
#37 opened by Ironholds - 3
- 4
NA support?
#33 opened by Ironholds - 30
Remove requirement for C++11
#26 opened by statspro1 - 0
Pass argument by reference using raw pointer
#20 opened by lmullen - 2
Ideas for other tokenizers
#27 opened by lmullen - 2
- 3
Installation failed on Microsoft R Server
#28 opened by kevinbsc - 2
R-devel / Travis
#23 opened by maelle - 1
- 5
Character level tokenizers
#22 opened by dselivanov - 1
Using long vectors
#21 opened by lmullen - 2
Error while installing tokenizers package
#18 opened by harshakap