html-to-markdown

There are 133 repositories under html-to-markdown topic.

  • firecrawl

    firecrawl/firecrawl

    🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data

    Language:TypeScript67.4k2567565.2k
  • mixmark-io/turndown

    🛏 An HTML to Markdown converter written in JavaScript

    Language:HTML10.4k122323946
  • adbar/trafilatura

    Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

    Language:Python4.9k32411328
  • html-to-markdown

    JohannesKaufmann/html-to-markdown

    ⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.

    Language:Go3.2k1484167
  • vsch/flexmark-java

    CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.

    Language:Java2.5k58542296
  • any4ai/AnyCrawl

    AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.

    Language:TypeScript2.4k1525237
  • helloworld-Co/html2md

    helloworld 开发者社区开源的一个轻量级,强大的 html 一键转 md 工具,支持多平台文章一键转换,并保存下载到本地。

    Language:JavaScript7761038193
  • firecrawl/firecrawl-app-examples

    🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

    Language:Jupyter Notebook59640182
  • philschmid/clipper.js

    HTML to Markdown converter and crawler.

    Language:TypeScript5953738
  • devflowinc/firecrawl-simple

    ➖ Stripped down, stable version of firecrawl optimized for self-hosting and ease of contribution. Billing logic and AI features are completely removed. Crawl and convert any website into LLM-ready markdown.

    Language:TypeScript53712247
  • breakdance/breakdance

    It's time for your markup to get down! HTML to markdown converter. Breakdance is a highly pluggable, flexible and easy to use.

    Language:JavaScript533231832
  • paulpierre/markdown-crawler

    A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG

    Language:Python41561150
  • mrusme/reader

    reader is for your command line what the “readability” view is for modern browsers: A lightweight tool offering better readability of web pages (and EML files!) on the CLI.

    Language:Go38242115
  • copy-as-markdown

    notlmn/copy-as-markdown

    📋 Browser extension to copy text as Markdown (with GFM and MathML support)

    Language:JavaScript37274553
  • inhumantsar/slurp

    Slurps webpages and saves them as clean, uncluttered Markdown. Think Pocket, but better.

    Language:TypeScript25346411
  • 0x6b/copy-selection-as-markdown

    Firefox add-on to copy selection as Markdown

    Language:JavaScript21326415
  • Spenhouet/confluence-markdown-exporter

    Export Atlassian Confluence pages as markdown files.

    Language:Python1751043
  • medium-2-md

    web3gautam/medium-2-md

    A CLI tool that converts exported Medium posts (html) to Jekyll/Hugo compatible markdown with front matter.

    Language:JavaScript14821620
  • agarwalvishal/claude-chat-exporter

    Claude Chat Exporter is a JavaScript tool that allows you to export your conversations with Claude AI into a well-formatted Markdown file.

    Language:JavaScript1092017
  • bevacqua/domador

    :smirk_cat: Dependency-free and lean DOM parser that outputs Markdown

    Language:JavaScript86676
  • oidlabs-com/Lexoid

    Multimodal document parser for high quality data understanding and extraction

    Language:Python8557310
  • inaridiy/webforai

    The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.

    Language:TypeScript74105
  • tim-gromeyer/html2md

    Transform your HTML into clean, easy-to-read markdown with html2md.

    Language:C++7221611
  • EvitanRelta/htmlarkdown

    HTML-to-Markdown converter that adaptively preserves HTML when needed (eg. when center-aligning, or resizing images)

    Language:TypeScript682443
  • syfxlin/xkeditor

    :pencil: XK-Editor | 一个支持富文本和Markdown的编辑器

    Language:CSS581115
  • lightfeed/extractor

    Using LLMs and AI browser automation to robustly extract web data

    Language:TypeScript52005
  • ActuallyTaylor/SwiftHTMLToMarkdown

    A simple Swift package that converts HTML into Markdown

    Language:Swift492222
  • Stardown

    Stardown-app/Stardown

    Copy the web as markdown

    Language:JavaScript403921
  • iw4p/url-to-markdown

    URL to Markdown API is a service that convert web content into clean, structured Markdown format through a simple HTTP GET request. It's built using FastAPI and the MarkItDown library, offering a straightforward way to convert various content types (web pages, YouTube videos, PDFs, documents) into Markdown that's optimized for Large Language Mod

    Language:Python354
  • kasvith/htmd

    A fast HTML to Markdown converter for Elixir, powered by Rust

    Language:Elixir35002
  • dedalozzo/converter

    A set of classes to translate a text from a HTML to BBcode and from BBCode to Markdown.

    Language:PHP27469
  • izyuumi/html2md-rs

    HTML to Markdown converter written in Rust

    Language:Rust251111
  • ParryQiu/Generate-Cnblogs-Articles-To-Markdown

    导出博客园的文章成 Markdown 文件存储

    Language:C#250619
  • spider-rs/web-crawling-guides

    How to guides on web-crawling or scraping

  • opendocs-md/do-tutorials

    Digital Ocean tutorials in Markdown format

    Language:Ruby233013
  • lumpinif/deepcrawl

    100% free and full open-source edge Firecrawl alternative with better links extraction for agents - that you can deploy by yourself.

    Language:TypeScript221