2nd project of Language Understanding Systems, Fall Semester 2017, University of Trento.
The project consist in building a model for concept sequence tagging for a Movie Domain. The task is similar to first project, but this time we use Conditional Random Fields instead of Finite State Transducers. All results are discussed in the report file.
The repository is organized in 2 main folders: report
and code
.
The report
folder contains the LaTeX report source files, the code
folder contains the scripts used to train and test the models, plus some utilities used to parse the results and compute statistics.
The code/model
folder contains the CRF++ template files.
The code/scripts/run.sh script can be used to train a given model, for example
./code/scripts/run.sh 03_advanced/10
The script will automatically create the code/computations
folder.
Inside this folder you will find a folder for each model you trained.
The code/scripts/genetic.py file implements a genetic algorithm to automatically select the best features for the CRF model. The script is written in Python3 and requires the DEAP library to work.
By default, the script required an initial population inside the code/scripts/initial_population
folder.
Please create this folder before running the script and populate it with some models trained generated by hand.
Please make sure to invoke the script from the code/scripts
folder.
The model source code is licences under the MIT license. A copy of the license is available in the LICENSE file. The LaTeX sources and the report are licenced under the Creative Commons Attribution-ShareAlike 4.0 International License.