stheid/tripper

Better prediction of file names

Opened this issue · 0 comments

Right now there are many files being put into the "check" folder that should be filtered alltogether.

Maybe the following simple classifier can be used:
Each possible tatort name is a class, than we extract the following features:
Name similarity; IsTeamMemberInDescription, IsCityInDescription...

Namesimilarity is probably passed to a transformation function such that only very high values are considered. Than i simply calculate the covex combination of the features with weights. Maybe the current function can be used to generate a dataset such that we can tune the weights or i will simply tune them by hand