IBMStreams/streamsx.nlp

RutaText ignores leading whitespace for indexes

Closed this issue · 3 comments

If I have a rule that is looking for the word "move", I get the same results from:
" I would like to move several of my " as I do for "I would like to move several of my "
resulting in: {typeDescription="watson.uima.ruta.extractors.Main.Transaction",text="move",begin=16,end=20}

The same test when I'm developing my text analytics handles the whitespace correctly (when I'm purely using UIMA).

Looks like this is a result of the trim here:

String document = tuple.getString(inputDoc).trim();

I believe this should be removed, or be optional via an attribute.

I would like to add a new optional parameter to be compatible. "trim inputData true" would be the default value.
@Alex-Cook4 Would this be fine for you?

@markheger That would be great for what I need.

solved in release v1.3.0