patrickfrey/strusAnalyzer

Library for document analysis (segmentation, tokenization, normalization, aggregation) with the goal to get a set of items that can be inserted into a strus storage. Also some functions for analysing tokens or phrases of the strus query are provided.

C++MPL-2.0

Issues

document markup does not resolve overlapping markups correctly
#67 opened 7 years ago by patrickfrey
1
C++11 dynamic exceptions warning in textwolf header files
#60 opened 7 years ago by andreasbaumann
1
List of supported document types and segmenters
#56 opened 7 years ago by andreasbaumann
5
build issues on RHEL-6
#59 opened 7 years ago by andreasbaumann
1
using nested segmenters and positions
#49 opened 8 years ago by andreasbaumann
7
What is the meaning of scheme in DocumentClass
#40 opened 8 years ago by andreasbaumann
6
How to split document in tokens in a mixed tagged format
#53 opened 8 years ago by andreasbaumann
2
How to do conditional indexing per language?
#55 opened 8 years ago by andreasbaumann
1
inconsistent parameters
#51 opened 8 years ago by andreasbaumann
1
Meaning of ~ in analyzer xpath selector
#50 opened 8 years ago by andreasbaumann
2
punctuation method, meaning of second parameter
#52 opened 8 years ago by andreasbaumann
1
How to index the filename?
#54 opened 8 years ago by andreasbaumann
1
multi-valued attributes are not supported
#42 opened 8 years ago by andreasbaumann
4
Required order of definitions when using sub content with segmenter switch
#48 opened 8 years ago by patrickfrey
1
tokenizer 'word' is not logical
#47 opened 8 years ago by andreasbaumann
2
JSON segmenter crashes as sub segmenter
#46 opened 8 years ago by andreasbaumann
1
unexcpected error message if XML header is broken
#45 opened 8 years ago by andreasbaumann
0
exceptions of type strus::runtime_error ignored
#44 opened 8 years ago by andreasbaumann
2
adding a segmenter requires changes across repos
#23 opened 9 years ago by andreasbaumann
1
error handling when semicolon is missing in attribute definition
#43 opened 8 years ago by andreasbaumann
2
date2int metadata mapping
#41 opened 8 years ago by andreasbaumann
0
Segmenters should have options
#27 opened 8 years ago by patrickfrey
1
RandomFeed is very slow on FreeBSD and OSX
#24 opened 9 years ago by andreasbaumann
0
Asynchronous document feeding implementation not complete
#9 opened 9 years ago by patrickfrey
1
cJSON library throws some warnings
#18 opened 9 years ago by andreasbaumann
2
OS X: Giving up for now :)
#11 opened 9 years ago by dw
4
Position information of overlapping annotations (attributes in XML) is not handled correctly
#2 opened 9 years ago by patrickfrey
1
remove hardcoded document properties
#5 opened 9 years ago by patrickfrey
1
End of sentence recognition does not work at all
#3 opened 9 years ago by patrickfrey
1
The XML segmenter processes only UTF-8
#4 opened 9 years ago by patrickfrey
1
Packaging fixes (unfinished)
#1 opened 10 years ago by andreasbaumann
0