GateNLP/gateplugin-LearningFramework

Allow to limit maximum sequence length for ngram features

johann-petrak opened this issue · 1 comments

This can be crucial if we use the deep learning backend. Ideally it should be possible to limit this
in the feature specification (this can reduce the initial dataset size), then limit even more in the pytorch backend through a parameter (for further experimenting).

For single feature datasets, sorting the training set by sequence length would be a good alternative to avoid excessive padding.

See GateNLP/gate-lf-python-data#23

See GateNLP/gate-lf-python-data#23
This has now be implemented so that initial settings in the feature specification file can be added.
However, the LF completely ignores this for sparse representations and does not shorten anything itself for dense representations.