As I no longer have time to maintain this project I am looking for collaborators to help to maintain. You can sign up by sending a pull request which fixes a bug or adds a feature.
A Python3 wrapper tool to help using ITU Turkish NLP Pipeline API
For details of the pipeline, please check the pipeline page and the sources below.
Eryigit, Gülsen. "ITU Turkish NLP Web Service." EACL. 2014.
To be able to use the pipeline, you need an authentication token (details on API web page).
If you experience any problem please contact with me via the gitter chat room.
This repository is tested with Python 3.4, 3.5 and 3.6 versions, but using the most up-to-date one is always better.
Using PyPI just run pip3 install ITU-Turkish-NLP-Pipeline-Caller
Download the latest release, extract the archive and inside that directory simply run python3 ./setup.py install
to install.
The tool reads the token from pipeline.token
file (under the same directory with the tool) by default.
Simply
pipeline_caller <filename>
reads the input file, prints the output under ./output/output<system_time>
You can select the pipeline tool by using -t
option
pipeline_caller <filename> --tool <tool_name>
default is "pipelineNoisy"
You can force the encoding for I/O by using -e
option
pipeline_caller <filename> -e <encoding>
default is your system locale
You can switch processing type using -p
option. Input text can be processed whole at once, sentence by sentence or word by word. For some tools (isturkish
for example) in the Pipeline, word by word processing is necessary at the moment. Default type is whole at once.
Example: pipeline_caller <filename> --tool isturkish -p word
sends input text to isturkish
tool, word by word.
And you can change the output directory by using -o
option
pipeline_caller <filename> -o <another_directory>
default is "output"
Also pipeline_caller --help
shows the help menu.
import pipeline_caller
caller = pipeline_caller.PipelineCaller()
result = caller.call(<tool_name>, <text>, <api_token>)
Check DEFAULTS block in the source code if you need (generally, you don't) to change one of these:
api_url = "http://tools.nlp.itu.edu.tr/SimpleApi"
pipeline_encoding = 'UTF-8'
token_path = "pipeline.token"
for command line tool
default_output_dir = "output"
default_enconding = locale.getpreferredencoding(False)
default encoding in your OS, for I/O operations in command line tool
default_sentence_split_delimiter_class = "[\.\?:;!]"
for command line tool, to separate sentences and process sentence by sentence
Special thanks to Asst. Prof. Dr. Peter Schüller for his great suggestions!
This work was a part of a KnowLP research project.
Copyright 2015-2018 Maintainers:
- Ferit Tunçer, ferit@cryptolab.net
- Ülgen Sarıkavak, work@ulgens.me
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.