html-to-markdown

There are 124 repositories under html-to-markdown topic.

  • firecrawl/firecrawl

    The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data 🔥

    Language:TypeScript57.6k4.8k
  • ScrapeGraphAI/Scrapegraph-ai

    Python scraper based on AI

    Language:Python21.3k1344061.8k
  • mixmark-io/turndown

    🛏 An HTML to Markdown converter written in JavaScript

    Language:HTML10.3k124321937
  • adbar/trafilatura

    Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

    Language:Python4.7k31404288
  • html-to-markdown

    JohannesKaufmann/html-to-markdown

    ⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.

    Language:Go3.1k1579158
  • vsch/flexmark-java

    CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.

    Language:Java2.5k58536292
  • any4ai/AnyCrawl

    AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.

    Language:TypeScript2.1k196
  • helloworld-Co/html2md

    helloworld 开发者社区开源的一个轻量级,强大的 html 一键转 md 工具,支持多平台文章一键转换,并保存下载到本地。

    Language:JavaScript7681037191
  • philschmid/clipper.js

    HTML to Markdown converter and crawler.

    Language:TypeScript5914738
  • firecrawl/firecrawl-app-examples

    🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

    Language:Jupyter Notebook544169
  • breakdance/breakdance

    It's time for your markup to get down! HTML to markdown converter. Breakdance is a highly pluggable, flexible and easy to use.

    Language:JavaScript533231831
  • devflowinc/firecrawl-simple

    ➖ Stripped down, stable version of firecrawl optimized for self-hosting and ease of contribution. Billing logic and AI features are completely removed. Crawl and convert any website into LLM-ready markdown.

    Language:TypeScript51902145
  • paulpierre/markdown-crawler

    A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG

    Language:Python40251143
  • mrusme/reader

    reader is for your command line what the “readability” view is for modern browsers: A lightweight tool offering better readability of web pages (and EML files!) on the CLI.

    Language:Go37452012
  • copy-as-markdown

    notlmn/copy-as-markdown

    📋 Browser extension to copy text as Markdown (with GFM and MathML support)

    Language:JavaScript36674453
  • inhumantsar/slurp

    Slurps webpages and saves them as clean, uncluttered Markdown. Think Pocket, but better.

    Language:TypeScript2434647
  • 0x6b/copy-selection-as-markdown

    Firefox add-on to copy selection as Markdown

    Language:JavaScript20836415
  • medium-2-md

    gtmdh/medium-2-md

    A CLI tool that converts exported Medium posts (html) to Jekyll/Hugo compatible markdown with front matter.

    Language:JavaScript148
  • Spenhouet/confluence-markdown-exporter

    Export Atlassian Confluence pages as markdown files.

    Language:Python126100
  • bevacqua/domador

    :smirk_cat: Dependency-free and lean DOM parser that outputs Markdown

    Language:JavaScript86676
  • oidlabs-com/Lexoid

    Multimodal document parser for high quality data understanding and extraction

    Language:Python795678
  • inaridiy/webforai

    The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.

    Language:TypeScript71105
  • EvitanRelta/htmlarkdown

    HTML-to-Markdown converter that adaptively preserves HTML when needed (eg. when center-aligning, or resizing images)

    Language:TypeScript682443
  • agarwalvishal/claude-chat-exporter

    Claude Chat Exporter is a JavaScript tool that allows you to export your conversations with Claude AI into a well-formatted Markdown file.

    Language:JavaScript66200
  • tim-gromeyer/html2md

    Transform your HTML into clean, easy-to-read markdown with html2md.

    Language:C++6621110
  • syfxlin/xkeditor

    :pencil: XK-Editor | 一个支持富文本和Markdown的编辑器

    Language:CSS581115
  • ActuallyTaylor/SwiftHTMLToMarkdown

    A simple Swift package that converts HTML into Markdown

    Language:Swift472220
  • lightfeed/extractor

    Using LLMs and AI Browser Automation to Robustly Extract Web Data

    Language:TypeScript46
  • Stardown

    Stardown-app/Stardown

    Copy the web as markdown

    Language:JavaScript403781
  • iw4p/url-to-markdown

    URL to Markdown API is a service that convert web content into clean, structured Markdown format through a simple HTTP GET request. It's built using FastAPI and the MarkItDown library, offering a straightforward way to convert various content types (web pages, YouTube videos, PDFs, documents) into Markdown that's optimized for Large Language Mod

    Language:Python322
  • kasvith/htmd

    A fast HTML to Markdown converter for Elixir, powered by Rust

    Language:Elixir322
  • dedalozzo/converter

    A set of classes to translate a text from a HTML to BBcode and from BBCode to Markdown.

    Language:PHP27469
  • ParryQiu/Generate-Cnblogs-Articles-To-Markdown

    导出博客园的文章成 Markdown 文件存储

    Language:C#250619
  • izyuumi/html2md-rs

    HTML to Markdown converter written in Rust

    Language:Rust231112
  • opendocs-md/do-tutorials

    Digital Ocean tutorials in Markdown format

    Language:Ruby233012
  • spider-rs/web-crawling-guides

    How to guides on web-crawling or scraping