/Java-Text-Extractor-API

Implementing java based text extractors as web APIs (currently only Boilerpipe & Goose)

Primary LanguageJava

Java Text Extractor API

Web API for Java based text extractors. Implemented using Play framework.

Author

Tomaž Kovačič <tomaz.kovacic@gmail.com>

Extractors supported

API Documentation

Note: All parameters should be encoded using x-www-form-urlencoded

Boilerpipe API

method: POST

endpoint: http://yourdomain/boilerpipe/extract/

params:

  • extractorType : (article|default|sentence)
  • rawHtml : html content

JSON response format:

{
        "result": RESULT_TEXT
        "status": (OK|ERROR)
        "errorMsg": ERROR_MESSAGE (optional)
}

Goose API

method: POST

endpoint: http://yourdomain/goose/extract/

params:

  • rawHtml : html content

JSON response format:

{
        "result": RESULT_TEXT
        "status": (OK|ERROR)
        "errorMsg": ERROR_MESSAGE (optional)
}

Dependencies

  • Play framework v1.1.1.

Licence

  • Everything that's not in the /lib/ directory is licenced under GPLv3

  • Jar packages in the /lib/ are licenced under their respective licence listed below:

Copyright (C) Tomaž Kovačič

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.