yoursock

Anhui University of Technology

yoursock's Stars

coolwanglu/pdf2htmlEX
Convert PDF to HTML without losing text or format.
Language:HTML10.3k1.8k
mozilla/pdf.js
PDF Reader in JavaScript
Language:JavaScript47.1k9.8k
jsvine/pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
Language:Python6k618
psf/requests
A simple, yet elegant, HTTP library.
Language:Python51.7k9.3k
shuliu586/AI_Chinese_DataSet_KnowledgeDAO
供AI训练的中文数据集（持续更新。。。）与AI公司图谱，目前的数据集餐饮行业8000问，百度知道，Alpaca中文数据集，计算机领域数据集，Vicuna数据集，RedPajama数据集，Wikipedia中文词条数据集，网站论坛问答数据集
Language:Jupyter Notebook486
zhaoboy9692/me-tools
一些小工具，淘宝抓包xposed模块、微信运动、秒杀抢票、table2json、抹机王、flyme助手
Language:Python25287
Eeyhan/WTools
13
SkyworkAI/Skywork
Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数，训练数据，评估数据，评估方法。
Language:Python1.2k105
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Language:Python4.2k326
kangvcar/InfoSpider
INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰，旨在安全快捷的帮助用户拿回自己的数据，工具代码开源，流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、**移动、**联通、**电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源**博客、简书。
Language:Python7.7k1.5k
srx-2000/spider_collection
python爬虫，目前库存：网易云音乐歌曲爬取，B站视频爬取，知乎问答爬取，壁纸爬取，xvideos视频爬取，有声书爬取，微博爬虫，安居客信息爬取+数据可视化，哔哩哔哩视频封面提取器，ip代理池封装，知乎百万级用户爬虫+数据分析，github用户爬虫
Language:Python1.1k217
wnma3mz/wechat_articles_spider
微信公众号文章的爬虫
Language:Python2.7k698
doocs/leetcode
🔥LeetCode solutions in any programming language | 多种编程语言实现 LeetCode、《剑指 Offer（第 2 版）》、《程序员面试金典（第 6 版）》题解
Language:Java29.8k5.6k
spdustin/ChatGPT-AutoExpert
🚀🧠💬 Supercharged Custom Instructions for ChatGPT (non-coding) and ChatGPT Advanced Data Analysis (coding).
Language:JavaScript6.5k453
NLP-LOVE/Introduction-NLP
HanLP作者的新书《自然语言处理入门》详细笔记！业界良心之作，书中不是枯燥无味的公式罗列，而是用白话阐述的通俗易懂的算法模型。从基本概念出发，逐步介绍中文分词、词性标注、命名实体识别、信息抽取、文本聚类、文本分类、句法分析这几个热门问题的算法原理与工程实现。
Language:Python2.1k542
TheAlgorithms/Python
All Algorithms implemented in Python
Language:Python182k44k
NLP-LOVE/ML-NLP
此项目是机器学习(Machine Learning)、深度学习(Deep Learning)、NLP面试中常考到的知识点和代码实现，也是作为一个算法工程师必会的理论基础知识。
Language:Jupyter Notebook15.4k4.5k
649453932/Chinese-Text-Classification-Pytorch
中文文本分类，TextCNN，TextRNN，FastText，TextRCNN，BiLSTM_Attention，DPCNN，Transformer，基于pytorch，开箱即用。
Language:Python5.2k1.2k
xiangsx/gpt4free-ts
Providing a free OpenAI GPT-4 API ! This is a replication project for the typescript version of xtekky/gpt4free
Language:TypeScript7.5k1.3k
miyakogi/pyppeteer
Headless chrome/chromium automation library (unofficial port of puppeteer)
Language:Python3.6k372
chromium/chromium
The official GitHub mirror of the Chromium source
18.1k6.7k
adbar/trafilatura
Python & command-line tool to gather text on the Web: Crawling & scraping, content extraction, metadata. TXT, Markdown, CSV & XML output.
Language:Python3.2k239
mit-han-lab/streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Language:Python6.4k356
codelucas/newspaper
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
Language:Python13.9k2.1k
xianyucoder/Crack-JS
🕷🎯Python3爬虫项目进阶实战、JS加解密、逆向教程、css 加密、字体加密 - 犀牛数据 | 美团美食 | 企名片 | 七麦数据 | 淘大象 | 梦幻西游藏宝阁 | 国家企业信用信息公示系统 | 漫画柜 | 财联社 | **空气质量在线监测分析平台 | 66ip代理 | 零度ip | **产品大目录 | JSFuck | 咪咕视频 | 房天下 | 新浪微博 | 新浪二手房 | 极贷助手 | 裁判文书网 | 空中网 | 粉笔网 | 叮当快药 | 58同城 | wallhere | 豆瓣读书 | google 镜像站 | openlaw | X里文学 | 刺猬猫小说 |
Language:JavaScript1.3k381
gildas-lormeau/SingleFile
Web Extension for saving a faithful copy of a complete web page in a single HTML file
Language:JavaScript14.3k945
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
Language:Python38.3k4.3k
wbingli/awscli-plugin-endpoint
An awscli plugin to configure service endpoint from aws configure file
Language:Python20523
PaddlePaddle/PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Language:Python40.2k7.4k
Layout-Parser/layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
Language:Python4.6k450

yoursock

yoursock's Stars

coolwanglu/pdf2htmlEX

mozilla/pdf.js

jsvine/pdfplumber

psf/requests

shuliu586/AI_Chinese_DataSet_KnowledgeDAO

zhaoboy9692/me-tools

Eeyhan/WTools

SkyworkAI/Skywork

QwenLM/Qwen-VL

kangvcar/InfoSpider

srx-2000/spider_collection

wnma3mz/wechat_articles_spider

doocs/leetcode

spdustin/ChatGPT-AutoExpert

NLP-LOVE/Introduction-NLP

TheAlgorithms/Python

NLP-LOVE/ML-NLP

649453932/Chinese-Text-Classification-Pytorch

xiangsx/gpt4free-ts

miyakogi/pyppeteer

chromium/chromium

adbar/trafilatura

mit-han-lab/streaming-llm

codelucas/newspaper

xianyucoder/Crack-JS

gildas-lormeau/SingleFile

hpcaitech/ColossalAI

wbingli/awscli-plugin-endpoint

PaddlePaddle/PaddleOCR

Layout-Parser/layout-parser