Allow to limit maximum sequence length for ngram features
johann-petrak opened this issue · 1 comments
johann-petrak commented
This can be crucial if we use the deep learning backend. Ideally it should be possible to limit this
in the feature specification (this can reduce the initial dataset size), then limit even more in the pytorch backend through a parameter (for further experimenting).
For single feature datasets, sorting the training set by sequence length would be a good alternative to avoid excessive padding.
johann-petrak commented
See GateNLP/gate-lf-python-data#23
This has now be implemented so that initial settings in the feature specification file can be added.
However, the LF completely ignores this for sparse representations and does not shorten anything itself for dense representations.