yoursock's Stars
coolwanglu/pdf2htmlEX
Convert PDF to HTML without losing text or format.
mozilla/pdf.js
PDF Reader in JavaScript
jsvine/pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
psf/requests
A simple, yet elegant, HTTP library.
shuliu586/AI_Chinese_DataSet_KnowledgeDAO
供AI训练的中文数据集(持续更新。。。)与AI公司图谱,目前的数据集餐饮行业8000问,百度知道,Alpaca中文数据集,计算机领域数据集,Vicuna数据集,RedPajama数据集,Wikipedia中文词条数据集,网站论坛问答数据集
zhaoboy9692/me-tools
一些小工具,淘宝抓包xposed模块、微信运动、秒杀抢票、table2json、抹机王、flyme助手
Eeyhan/WTools
SkyworkAI/Skywork
Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数,训练数据,评估数据,评估方法。
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
kangvcar/InfoSpider
INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、**移动、**联通、**电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源**博客、简书。
srx-2000/spider_collection
python爬虫,目前库存:网易云音乐歌曲爬取,B站视频爬取,知乎问答爬取,壁纸爬取,xvideos视频爬取,有声书爬取,微博爬虫,安居客信息爬取+数据可视化,哔哩哔哩视频封面提取器,ip代理池封装,知乎百万级用户爬虫+数据分析,github用户爬虫
wnma3mz/wechat_articles_spider
微信公众号文章的爬虫
doocs/leetcode
🔥LeetCode solutions in any programming language | 多种编程语言实现 LeetCode、《剑指 Offer(第 2 版)》、《程序员面试金典(第 6 版)》题解
spdustin/ChatGPT-AutoExpert
🚀🧠💬 Supercharged Custom Instructions for ChatGPT (non-coding) and ChatGPT Advanced Data Analysis (coding).
NLP-LOVE/Introduction-NLP
HanLP作者的新书《自然语言处理入门》详细笔记!业界良心之作,书中不是枯燥无味的公式罗列,而是用白话阐述的通俗易懂的算法模型。从基本概念出发,逐步介绍中文分词、词性标注、命名实体识别、信息抽取、文本聚类、文本分类、句法分析这几个热门问题的算法原理与工程实现。
TheAlgorithms/Python
All Algorithms implemented in Python
NLP-LOVE/ML-NLP
此项目是机器学习(Machine Learning)、深度学习(Deep Learning)、NLP面试中常考到的知识点和代码实现,也是作为一个算法工程师必会的理论基础知识。
649453932/Chinese-Text-Classification-Pytorch
中文文本分类,TextCNN,TextRNN,FastText,TextRCNN,BiLSTM_Attention,DPCNN,Transformer,基于pytorch,开箱即用。
xiangsx/gpt4free-ts
Providing a free OpenAI GPT-4 API ! This is a replication project for the typescript version of xtekky/gpt4free
miyakogi/pyppeteer
Headless chrome/chromium automation library (unofficial port of puppeteer)
chromium/chromium
The official GitHub mirror of the Chromium source
adbar/trafilatura
Python & command-line tool to gather text on the Web: Crawling & scraping, content extraction, metadata. TXT, Markdown, CSV & XML output.
mit-han-lab/streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
codelucas/newspaper
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
xianyucoder/Crack-JS
🕷🎯Python3爬虫项目进阶实战、JS加解密、逆向教程、css 加密、字体加密 - 犀牛数据 | 美团美食 | 企名片 | 七麦数据 | 淘大象 | 梦幻西游藏宝阁 | 国家企业信用信息公示系统 | 漫画柜 | 财联社 | **空气质量在线监测分析平台 | 66ip代理 | 零度ip | **产品大目录 | JSFuck | 咪咕视频 | 房天下 | 新浪微博 | 新浪二手房 | 极贷助手 | 裁判文书网 | 空中网 | 粉笔网 | 叮当快药 | 58同城 | wallhere | 豆瓣读书 | google 镜像站 | openlaw | X里文学 | 刺猬猫小说 |
gildas-lormeau/SingleFile
Web Extension for saving a faithful copy of a complete web page in a single HTML file
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
wbingli/awscli-plugin-endpoint
An awscli plugin to configure service endpoint from aws configure file
PaddlePaddle/PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Layout-Parser/layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis