Primary LanguageAwk

Linguistic Data: Quantitative Analysis and Visualisation.

The project is aimed to analyse the data collected for the 2018 Duolingo Shared Task on Second Language Acquisition Modeling (SLAM) challenge organised by Duolingo AI in conjunction with the 13th BEA Workshop and NAACL-HLT 2018 conference. One of the key findings of the challenge was the fact that a choice of a learning algorithm (for the task) appears to be more important than clever feature engineering.

This project for the Linguistic Data: Quantitative Analysis and Visualisation course is aimed to explore if certain features have any influence on making a mistake during the process of SLA and thus be used to predict potential errors.

The work is following the license spefified for the challenge dataset, namely Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

To reproduce the project completely, first read instructions in the data folder.