/Html-Tags-Remover

A service in NodeJS that allows you to specify a custom array of tags to be removed from a text

Primary LanguageJavaScript

HTML Cleaner

A service in NodeJS that allows you to specify a custom array of tags to be removed from a text, usually an HTML page.
As been added to, the ability of remove tags but leave the text inside them. It can be useful if you need to grab the title, images and description from a webpage.

This service can works in cluster mode, that uses more than just one processor and can be defined on the configs file and in single mode, where only one instance running at one processor.

Uses package.json to solve all dependencies.

New configuration to personalize the working mode of the service.

HTTP Methods

  • POST - the information can be provided as form-data or json
  • GET - the information should be provided as simple arguments

Input

Parameters accepted by the service
  • tags - A dictionary of tags that will be used to apply on the provided html page.
    Possible uses of tags parameter:
    • no tags specified - all tags removed from the html, only text remains.
    • dictionary - key that matches the html tag and the value a dictionary with some specific keys that works as a rules to be applied to match the key. (some examples will be provided later)
  • src - The html page (raw data, not url) to be cleanned.
    • single - an unique html page
    • array - multiple html pages
  • url - the url that will be used to retrieve the html
    • single - an unique url
  • format - the format that should be used in the output
    • json - result will be outputed in json
    • text - result will be outputed in simple text
    • html - result will be outputed in html. Useful to be shown in a browser

Libraries

  • cluster2 - Used to run the service in all the processors available on the machine
  • express - Used to handle the http server and all the requests
  • htmlparser2 - Used to parse the DOM and handle the html tags and attributes

TODO

  • Work on configs to better customization of service
  • Finish and complete documentation aka Readme
  • Improve performance