patrickfrey/strusAnalyzer
Library for document analysis (segmentation, tokenization, normalization, aggregation) with the goal to get a set of items that can be inserted into a strus storage. Also some functions for analysing tokens or phrases of the strus query are provided.
C++MPL-2.0
Issues
- 1
- 1
- 5
- 1
build issues on RHEL-6
#59 opened by andreasbaumann - 7
using nested segmenters and positions
#49 opened by andreasbaumann - 6
- 2
- 1
- 1
inconsistent parameters
#51 opened by andreasbaumann - 2
Meaning of ~ in analyzer xpath selector
#50 opened by andreasbaumann - 1
- 1
How to index the filename?
#54 opened by andreasbaumann - 4
multi-valued attributes are not supported
#42 opened by andreasbaumann - 1
- 2
tokenizer 'word' is not logical
#47 opened by andreasbaumann - 1
JSON segmenter crashes as sub segmenter
#46 opened by andreasbaumann - 0
- 2
- 1
- 2
- 0
date2int metadata mapping
#41 opened by andreasbaumann - 1
Segmenters should have options
#27 opened by patrickfrey - 0
RandomFeed is very slow on FreeBSD and OSX
#24 opened by andreasbaumann - 1
- 2
cJSON library throws some warnings
#18 opened by andreasbaumann - 4
OS X: Giving up for now :)
#11 opened by dw - 1
Position information of overlapping annotations (attributes in XML) is not handled correctly
#2 opened by patrickfrey - 1
remove hardcoded document properties
#5 opened by patrickfrey - 1
- 1
The XML segmenter processes only UTF-8
#4 opened by patrickfrey - 0
Packaging fixes (unfinished)
#1 opened by andreasbaumann