extract-data
There are 275 repositories under extract-data topic.
opendatalab/MinerU
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
bda-research/node-crawler
Web Crawler/Spider for NodeJS + server-side jQuery ;-)
meltano/meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
DocumindHQ/documind
Open-source platform for extracting structured data from documents using AI.
elixir-crawly/crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
slotix/dataflowkit
Extract structured data from web sites. Web sites scraping.
OmkarPathak/ResumeParser
A simple resume parser used for extracting information from resumes
danschultzer/receipt-scanner
Receipt scanner extracts information from your PDF or image receipts - built in NodeJS
m92vyas/llm-reader
Turn Webpage to LLM friendly input text. Similar to Firecrawl and Jina Reader API. Makes RAG, AI web scraping, image & webpage links extraction easy.
Qusic/TraceUtility
Extract data from .trace documents generated by Instruments
itehax/rust-scraping
Web scraping using rust !
ropensci/smapr
An R package for acquisition and processing of NASA SMAP data
yuanxu-li/html-table-extractor
extract data from html table
msoap/html2data
Library and cli for extracting data from HTML via CSS selectors
CairX/extract-colors-py
Extract colors from an image. Colors are grouped based on visual similarities using the CIE76 formula.
isaacmg/fb_scraper
FBLYZE is a Facebook scraping system and analysis system.
Techcatchers/PyLyrics-Extractor
Get Lyrics for any songs by just passing in the song name (spelled or misspelled) in less than 2 seconds using this awesome Python Library.
asad70/Insider-Trading
This program extracts insider trading data from the sec website and stores it in excel file for the specified time frame.
fivesmallq/web-data-extractor
Extracting and parsing structured data with jQuery Selector, XPath or JsonPath from common web format like HTML, XML and JSON.
Skyluker4/UnityAssetReplacer
A tool to replace data in a Unity Asset Bundle from modified files.
giveabit/Trio-Plus-Data
Extract audio and other data from the Digitech Trio Plus guitar pedal's SD card
labteral/bluebird
Unofficial Python client for Twitter
osh/gr-eventstream
gr-eventstream is a set of GNU Radio blocks for creating precisely timed events and either inserting them into, or extracting them from normal data-streams precisely. It allows for the definition of high speed time-synchronous c++ burst event handlers, as well as bridging to standard GNU Radio Async PDU messages with precise timing easily.
Mamdouh66/Extracty
Extract structured data from any unstructured web page
hseera/python-utilities
Different python utility scripts to help automate mundane/repetitive tasks. Useful for performance testers/data scientist or anyone who wants to automate mundane tasks in python.
peterbencze/serritor
Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to render data.
serhaturtis/TOOL-FastBatchImageCrop
A simple UI tool to batch crop images to prepare datasets from images and videos.
AdemBoukhris457/Doctra
📄🔍 Parse, extract, and analyze documents with ease 📄🔍
ionictemplate-app/Social-Network-Data-Scraper-Pro
Easily scrape 10,000+ email messages in one hour, helping you quickly increase your customers Extracts data from (LinkedIn, Facebook, Instagram, Youtube, Pinterest, Twitter) Perfect search by specific Keywords Ready-to-use Social Network Data Scraper Software to get started instantly 100% Include source code and install file
peterstangl/svg2data
A Python module for reading data from a plot provided as SVG file.
righthandabacus/mdict_reader
Extract data from Octopus mdict (*.mdd, *.mdx) files
alienzhou/giframe
extract the first frame in GIF without reading whole bytes, support both browser and nodejs 📸
Agenty/scrapingai
Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty
ark-mod/ArkSavegameToolkitNet
Library for reading ARK Survival Evolved savegame files using C#.
mhismail/PinPoint-Digitizer
Open source digitizer application to extract data from plots