/DS2000-Project-1-Bonjour-and-Buenos-Dias

Course work of "DS2000 Programming with Data" applied a supervised machine learning method which create a tool to predict what language a document is written in given a sample set of known documents.

Primary LanguagePython

DS2000-Project-1-Bonjour-and-Buenos-Dias

Course work of "DS2000 Programming with Data" applied a supervised machine learning method which create a tool to predict what language a document is written in given a sample set of known documents.

Overview

The goal of this project is to create a tool to predict what language a document is written in given a sample set of known documents. The tool will be able to use various techniques to make guesses. The basis for each of the techniques will be looking at frequencies of trigrams. A trigram is a three character subsequence of the document. The program will be limited to a short list of languages and each language will have a small set of training data (documents for which the language is known).