wduo
A wandering machine learning researcher, bouncing between groups. I want to understand things clearly, and explain them well. - Colah
Pretending in Hangzhou Creative Culture Company(PH3C)Beijing(wangduo.cnblogs.com)
wduo's Stars
apache/doris
Apache Doris is an easy-to-use, high performance and unified analytics database.
tmux-plugins/tpm
Tmux Plugin Manager
apache/thrift
Apache Thrift
ymcui/Chinese-BERT-wwm
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
Megvii-BaseDetection/YOLOX
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/
Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
hluk/CopyQ
Clipboard manager with advanced features
NVIDIA/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
jsvine/pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
tesseract-ocr/tessdata
Trained models with fast variant of the "best" LSTM models + legacy models
pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Layout-Parser/layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
allenai/OLMo
Modeling, training, eval, and inference code for OLMo
huggingface/safetensors
Simple, safe way to store and distribute tensors
ArtifexSoftware/pdf2docx
Open source Python library for converting PDF to DOCX.
brightmart/roberta_zh
RoBERTa中文预训练模型: RoBERTa for Chinese
neo4j-labs/llm-graph-builder
Neo4j graph construction from unstructured data using LLMs
chen3feng/blade-build
Blade is a powerful build system from Tencent, supports many mainstream programming languages, such as C/C++, java, scala, python, protobuf...
huggingface/evaluate
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
stanford-oval/WikiChat
WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus.
nyu-mll/GLUE-baselines
[DEPRECATED] Repo for exploring multi-task learning approaches to learning sentence representations
quqxui/Awesome-LLM4IE-Papers
Awesome papers about generative Information Extraction (IE) using Large Language Models (LLMs)
thu-coai/CrossWOZ
A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset
xuyige/BERT4doc-Classification
Code and source for paper ``How to Fine-Tune BERT for Text Classification?``
HUSTAI/uie_pytorch
PaddleNLP UIE模型的PyTorch版实现
Thriftpy/thriftpy2
Pure python approach of Apache Thrift.
Unstructured-IO/unstructured-inference
CLUEbenchmark/SuperCLUE-Llama2-Chinese
Llama2开源模型中文版-全方位测评,基于SuperCLUE的OPEN基准 | Llama2 Chinese evaluation with SuperCLUE
qingyujean/document-level-classification
超长文本分类(大于1000字);文档级/篇章级文本分类;主要是解决长距离依赖问题
volcengine/volc-sdk-python