DISCOURSE ANALYSIS IN SOCIAL MEDIA #{mainpage} ================================== HOW TO BUILD THIS PROJECT ------------------------- After you have received the sources of this project, please change to the project's root directory (i.e. the directory in which this README file is located) and run the command `source scripts/set_env' or `. scripts/set_env' from your shell. Then, run the command `make' (or `gmake' depending on the system) which will compile all necessary resources required for work. Make sure, that all software listed in PREREQUISITES section is available on your machine prior to building the project. PREREQUISITES ------------- This project uses gawk, python2.7, several python modules, bash shell, and C++ as programming languages for its core utils. A full list of software packages, which should be installed on your machine for a successful build is provided below (older versions of some software may also work, but it is not guaranteed): - wget - gawk >= 4.0.1 - python >= 2.7.3 <3.1 - bash >= 4.2.45 - m4 >= 1.4.16 - GNU make >= 3.82 - doxygen >= 1.8.1 (or set `MAKE_DOC := 0' in Makefile.src) - g++-4.7 (or set appropriate compiler version in Makefile.src) - flex >= 2.5.35 (needed for alchemy-2) - bison >= 2.5 (needed for alchemy-2) - libhunspell >= 1.3.2-4 Python modules: - langid >= 1.1.4dev 3-rd Party Software: - other linguistic modules like `TreeTagger` or `MateParser` which are used in the pipeline will be automatically downloaded from the Web during the building procedure using `wget`. HOW TO USE THIS PROJECT ----------------------- Currently, the two commands that you may be most interested in are `TextTagger' and `TextParser' which are wrappers around Helmut Schmid's TreeTagger and Bernd Bohnet's MateParser, respectively. These commands are in fact shell scripts and you can find them in the subdirectory `scripts' of the current directory. The `TextTagger' command also performs Twitter-aware normalization of input text prior to doing tagging. Further commands are still under development. IF SOMETHING DOES NOT WORK -------------------------- Please note, that this project is currently rather a prototype and will sooner or later be re-written in a common NLP-framework like UIMA or GATE. Nevertheless, if you find that something crashes, does not work or does not work as expected, please report this issue to `uladzimir.sidarenka@uni-potsdam.de', we will be grateful for any found bugs. While sending bugs, please also provide information about 1) your operating system, 2) shell and python version you are using and 3) a succinct description of the issue with 4) a minimal not-working example and 5) maybe logs, if applicable. STRUCTURE --------- In each directory of this project unless these directories were created automatically or are ignored by Git, you should find a similar README file with a short description of the content of that particular directory. The listing below describes files and folders in the current directory: List of Directories: bin/ directory with compiled C++ code doc/ directory for automatically generated documents (ignored by Git) include/ directory with header files lingbin/ directory with compiled linguistic resources lingsrc/ directory with linguistic rules and resources (e.g. corpora, dictionaries, etc.) scripts/ script programs used to process data src/ source C++ code submodules/ directory for 3-rd party software tests/ collection of test cases and utilities used for testing tmp/ directory for storing temporary data (ignored by Git) List of Files: Doxyfile file with rules for building documentation LICENCES various licences of this and third party software Makefile rules for building whole project Makefile.lingsrc rules for building linguistic components of this project Makefile.src rules for building C++ sources Makefile.src rules for running tests README this file
WladimirSidorenko/TextNormalization
Code for the Text Normalization Pipeline Presented at GSCL 2013
PythonNOASSERTION