html2text
There are 33 repositories under html2text topic.
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
jaytaylor/html2text
Golang HTML to plaintext conversion library
weblyzard/inscriptis
A python based HTML to text conversion library, command line client and Web service.
inaridiy/webforai
The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.
voku/html2text
:memo: Html2Text - Convert HTML to formatted plain text, e.g. for text mails.
ThatXliner/unmarkd
An extremely configurable markdown reverser for Python3.
RxNLP/nlp-cloud-apis
RxNLP APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.
pH-7/Html2Text
A very simple (but efficient) "HTML to plain text" converter ✍️
deedy5/html2text_rs
Python library for converting HTML to markup or plain text
zacanger/html2txt
html2text but in node
x28/inscriptis-java
inscriptis - HTML to text conversion library for Java
AndyTheFactory/article-extraction-dataset
Article title, authors, date and body extraction dataset.
gereoffy/deepspam2
DeepSpam milter v2
susilthapa/knowledge-retrieval-with-imgs
AI chat app to response data in Markdown format with text and images. Tutorial from: https://youtu.be/qKtM2AlDTs8
importcjj/go-readability
Go package that cleans a HTML page for better readability.
kr1shnasomani/WebScrub
Python code which extracts the html content, converts it to clean text and pre-processes the text
BrenoFariasdaSilva/Python
My Python Codes.
erayon/PubMed
This project involves building a robust classifier that classifies whether a document (from abstract content) belongs to cancer class or not.
gsdefender/packtpub_telegram_bot
Receive Packt Publishing Ltd. Free Learning updates in Telegram every day
hcq0618/html-files-to-markdown-files
batch convert html files to mardown files
LukaszNiewinski/Microservice-for-retrieving-img-and-text
Microservice for text and images collection for data science purposes.
masroore/php-html2text
A PHP package to convert HTML into a plain text format
MattJeanLouis/scrap_web
C'est un projet de web scraping qui utilise Streamlit, BeautifulSoup, et html2text pour extraire, convertir en Markdown, et afficher le contenu de toutes les pages liées à une URL donnée. Il fournit un sommaire interactif des URL visitées et permet d'afficher le contenu extrait dans un format facile à lire.
puhoy/readability_cli
a cli tool to fetch webpages main content and print it as markdown
rubix1138/html2text
html2text Search Command for Splunk
AbdellatifCHE/Collect_Store_Search
The goal is to create a solution that crawls for articles from a news website (Theguardian), cleanses the response, stores it in a hosted mongo database (MongoDB Atlas), then makes it available to search via an API.
breadrock1/news-rss
There is simple project to scrape and collect news using rss and llm API based on rust.
gemichelst/notesConverter
converts any .html file in a specified folder into a .txt file and combines all single .txt files into one big text file
afeiship/next-html2text
Strip html to text for next.
cycloidio/docker-image-html2text
Dockerized html2text command-line tool
cycloidio/docker-image-python-html2text
Dockerized Python html2text command-line tool
luminati-io/rag-chatbot
A Python-based RAG chatbot leveraging GPT-4o and Bright Data's SERP API to deliver contextually rich and up-to-date AI responses using real-time search engine data.
sophiaken/Web-Scraping-Project-Python
Scraped Web using an automated python script that acted as scrapper to extract content from Wikipedia pages and created a clean dataset from it.