programming-language-classifier

The purpose of this repository is to experiment with machine learning in scikit-learn to create a model that will classify code snippets into programming languages.

Training data was procured from The Computer Language Benchmarks Game: http://benchmarksgame.alioth.debian.org/

The classifier was trained on the following programming languages:

  • C (.gcc, .c)
  • C#
  • Common Lisp (.sbcl)
  • Clojure
  • Java
  • JavaScript
  • OCaml
  • Perl
  • PHP (.hack, .php)
  • Python
  • Ruby (.jruby, .yarv)
  • Scala
  • Scheme (.racket)