yoursock's Stars
mixmark-io/turndown
🛏 An HTML to Markdown converter written in JavaScript
jina-ai/reader
Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/
berstend/puppeteer-extra
💯 Teach puppeteer new tricks through plugins.
mixmark-io/turndown-plugin-gfm
:octocat: Turndown plugin to add GitHub Flavored Markdown extensions
UpstageAI/dataverse
The Universe of Data. All about data, data science, and data engineering
ahmad-PH/auto_lcc
Automatic library of congress classification, using word embeddings from book titles and synopses.
Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
xai-org/grok-1
Grok open release
bhaskatripathi/pdfGPT
PDF GPT allows you to chat with the contents of your PDF file by using GPT capabilities. The most effective open source solution to turn your pdf files in a chatbot!
netease-youdao/QAnything
Question and Answer based on Anything.
jscck/crack.js
Tool for javascript Encryption confusion cracking
PaddlePaddle/PaddleNLP
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
dongrixinyu/JioNLP
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
great1001/MyHeyGen
huggingface/datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
jayhenry/pdf2txt_mnbvc
keybase/triplesec
Triple Security for the browser and Node.js
allenai/pdf-component-library
ollama/ollama
Get up and running with Llama 3, Mistral, Gemma 2, and other large language models.
allenai/scholarphi
An interactive PDF reader.
allenai/scholar-reader-pdfjs
Fork of pdf.js for the Scholar Reader
allenai/s2_fos
microsoft/playwright-python
Python version of the Playwright testing and automation library.
zTrix/webpage2html
save/convert web pages to a standalone editable html file for offline archive/view/edit/play/whatever
markusmobius/nodeSavePageWE
Fork of SavePageWE Chrome Extension adapted for Node.js plus Puppeteer (updated 2023): converts a website into a self-contained single html file
gildas-lormeau/single-file-cli
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
PharMolix/OpenBioMed
BuilderIO/gpt-crawler
Crawl a site to generate knowledge files to create your own custom GPT from a URL
currentslab/extractnet
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
lixi5338619/lxSpider
爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、各种指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书、大众点评、推特、脉脉、知乎》