ClearTK/cleartk

flag to exclude original features in FeatureFunctionExtractor

Closed this issue · 3 comments

Original issue 389 created by ClearTK on 2013-10-27T17:07:50.000Z:

I'd like to show how one could collect features corresponding to the covered text of an annotation after its been lower cased using the LowerCaseFeatureFunction. I would expect that I could use the FeatureFunctionExtractor with something like this:

extractor = new FeatureFunctionExtractor<Token>(extractor, new LowerCaseFeatureFunction());

But this extractor returns two features for each annotation it is given. Any objections to adding a constructor that includes a boolean that determines whether or not to return the original features (in this case the covered text feature?) Should be backwards compatible.

[Steve's Reply]
I object to boolean parameters, since when reading a method
invocation, you never know what "true" or "false" means. If you want
to make an enum, I might be okay with that.

That said, I wonder if you really want to use an extractor here? Why
not just invoke the feature function(s) directly?

Comment #1 originally posted by ClearTK on 2013-10-27T17:10:40.000Z:

That's a good point. I think an enum would be fine here.

To me the scenario above seems very reasonable if you decide that having lowercased covered text features only is better than having both the covered text features and the lowercased covered text features.

Comment #2 originally posted by ClearTK on 2013-10-27T18:46:51.000Z:

I think I was confused by your "collect features ... after its been lower cased". I thought you meant you wanted to apply more FeatureFunctions after the lower casing, in which case wrapping it in a FeatureFunctionExtractor seems like the wrong thing to do.

I'm okay with adding an enum to the FeatureFunctionExtractor constructor that allows for ADD (the current behavior) and REPLACE (your desired behavior).

Comment #3 originally posted by ClearTK on 2013-10-31T05:01:00.000Z:

This issue was closed by revision 8ed2eb674e12.