/learn-grammar

Unsupervised language grammar learning experiment

Primary LanguageC++

Learn Grammar

Introduction

The Learn Grammar is an experiment of unsupervised language grammar learning based on previous works within OpenCog project by Linas Vepstas and GSoC 2015 student Rohid Shinde.

You can find theoretical introduction to the problem solution in Deniz Yureth PhD thesis:

Description of the project steps and its current status of implementation in OpenCog you can find here:

More practical details of previous experiment by Rohid Shinde on OpenCog mailing list Q&A:

Dependencies

Link Grammar Parser

Build and installation of Link Grammar library is required.

RelEx Semantic Relation Extractor

Build is not required. We are using only a few utility scripts for parsing Wikipedia articles and spliting sentences.

Run experiment

  1. Configuration

    Edit config.sh. Set path to Relex sources. Set experiment language and maximum number of Wikipedia dump files to download.

  2. Download Wikipedia dumps

    ./fetch-wiki-pages.sh
    
  3. Calculate counts of word pairs

    ./count-all-word-pairs.sh
    ./merge-word-pairs-counts.sh