Word Segmentation

This repository contains code for the experiments in a chapter of my 2013 dissertation Modeling Words in the Mind, as well as the papers Infant Word Segmentation: An Incremental, Integrated Model and Modeling Infant Word Segmentation.

Unfortunately, I never had a chance to publish a journal article that includes the thesis chapter experiments, which are significantly more refined than what is presented in the two papers.

If you would like a copy of the thesis chapter, please contact me at lignos at brandeis dot edu.

If you use this code, please cite the following:

@article{lignos2013modeling,
  title={Modeling words in the mind},
  author={Lignos, Constantine},
  school={University of Pennsylvania},
  year={2013}
}


@inproceedings{lignos2012infant,
  title={Infant word segmentation: An incremental, integrated model},
  author={Lignos, Constantine},
  booktitle={Proceedings of the West Coast Conference on Formal Linguistics},
  volume={30},
  pages={237--247},
  year={2012}
}


@inproceedings{lignos-2011-modeling,
    title = "Modeling Infant Word Segmentation",
    author = "Lignos, Constantine",
    booktitle = "Proceedings of the Fifteenth Conference on Computational Natural Language Learning",
    month = jun,
    year = "2011",
    address = "Portland, Oregon, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W11-0304",
    pages = "29--38",
}

Structure

Repository structure:

/cats Code for word segmentation simulations
/data Word segmentation data
/tools Tools for preparing input and analyzing output

All Python code should be run using Python 2.7. All Java code was written for Java 6 but will run on newer versions.

Running Segmentation Experiments Using CATS

See the CATS README for instructions.

ConstantineLignos/WordSegmentation

Word Segmentation

Structure

Running Segmentation Experiments Using CATS