yoursock

Anhui University of Technology

yoursock's Stars

BIT-ENGD/baidu_baike
Language:Python425
fake-useragent/fake-useragent
Up-to-date simple useragent faker with real world database
Language:Python3.5k513
mlfoundations/dclm
DataComp for Language Models
Language:HTML1397
TeamHG-Memex/autopager
Detect and classify pagination links
Language:HTML9725
pyppeteer/pyppeteer
Headless chrome/chromium automation library (unofficial port of puppeteer)
Language:Python3.5k324
taishi-i/awesome-japanese-nlp-resources
A curated list of resources dedicated to Python libraries, LLMs, dictionaries, and corpora of NLP for Japanese
61722
RubyMetric/chsrc
chsrc 全平台通用换源工具. Change Source for every software on every platform from the command line.
Language:C1.2k53
BaiduSpider/BaiduSpider
BaiduSpider，一个爬取百度搜索结果的爬虫，目前支持百度网页搜索，百度图片搜索，百度知道搜索，百度视频搜索，百度资讯搜索，百度文库搜索，百度经验搜索和百度百科搜索。
Language:Python960206
helloworld-Co/html2md
helloworld 开发者社区开源的一个轻量级，强大的 html 一键转 md 工具，支持多平台文章一键转换，并保存下载到本地。
Language:JavaScript630169
jina-ai/jina
☁️ Build multimodal AI applications with cloud-native stack
Language:Python20.5k2.2k
onuratakan/gpt-computer-assistant
gpt-4o for windows, macos and linux
Language:Python4.7k441
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Language:Python35.5k4.4k
modelscope/data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据！
Language:Python1.7k113
VikParuchuri/marker
Convert PDF to markdown quickly with high accuracy
Language:Python13.7k682
deepset-ai/haystack
:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
Language:Python14.5k1.7k
datawhalechina/joyful-pandas
pandas中文教程
Language:Jupyter Notebook4.4k1.8k
datawhalechina/self-llm
《开源大模型食用指南》基于Linux环境快速部署开源大模型，更适合**宝宝的部署教程
Language:Jupyter Notebook6k736
g1879/DrissionPage
基于python的网页自动化工具。既能控制浏览器，也能收发数据包。可兼顾浏览器自动化的便利性和requests的高效率。功能强大，内置无数人性化设计和便捷功能。语法简洁而优雅，代码量少。
Language:Python6k606
martinsbalodis/web-scraper-chrome-extension
Web data extraction tool implemented as chrome extension
Language:JavaScript1.3k436
alirezamika/autoscraper
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Language:Python6k633
Stirling-Tools/Stirling-PDF
#1 Locally hosted web application that allows you to perform various operations on PDF files
Language:Java29.6k2.2k
hankcs/HanLP
中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理
Language:Python32.9k9.8k
tencentmusic/supersonic
SuperSonic is the next-generation BI platform that integrates Chat BI (powered by LLM) and Headless BI (powered by semantic layer) paradigms.
Language:Java1.5k225
ltd0102/ghs
27853
hiyouga/LLaMA-Factory
Unify Efficient Fine-Tuning of 100+ LLMs
Language:Python25.4k3.1k
kermitt2/grobid
A machine learning software for extracting information from scholarly documents
Language:Java3.2k435
cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Language:Python9k688
WZBSocialScienceCenter/pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
Language:Python2.2k367
pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Language:Python4.5k444
pdf2htmlEX/pdf2htmlEX
Convert PDF to HTML without losing text or format.
Language:HTML3.5k352

yoursock

yoursock's Stars

BIT-ENGD/baidu_baike

fake-useragent/fake-useragent

mlfoundations/dclm

TeamHG-Memex/autopager

pyppeteer/pyppeteer

taishi-i/awesome-japanese-nlp-resources

RubyMetric/chsrc

BaiduSpider/BaiduSpider

helloworld-Co/html2md

jina-ai/jina

onuratakan/gpt-computer-assistant

lm-sys/FastChat

modelscope/data-juicer

VikParuchuri/marker

deepset-ai/haystack

datawhalechina/joyful-pandas

datawhalechina/self-llm

g1879/DrissionPage

martinsbalodis/web-scraper-chrome-extension

alirezamika/autoscraper

Stirling-Tools/Stirling-PDF

hankcs/HanLP

tencentmusic/supersonic

ltd0102/ghs

hiyouga/LLaMA-Factory

kermitt2/grobid

cleanlab/cleanlab

WZBSocialScienceCenter/pdftabextract

pymupdf/PyMuPDF

pdf2htmlEX/pdf2htmlEX