Hironsan

OSS developer | Interested in Natural Language Processing

Tokyo, Japan

Hironsan's Stars

deepseek-ai/DeepSeek-V3
Language:Python95.2k 742 49815.4k
VikParuchuri/marker
Convert PDF to markdown + JSON quickly with high accuracy
Language:Python23.9k 97 4311.5k
omnivore-app/omnivore
Omnivore is a complete, open source read-it-later solution for people who like reading.
Language:TypeScript14.5k 60 1.3k1.1k
block/goose
an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
Language:Rust11.3k 72 426831
allenai/olmocr
Toolkit for linearizing PDFs for LLM datasets/training
Language:Python10.9k 60 120739
py-pdf/pypdf
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
Language:Python8.9k 146 1.2k1.4k
opendatalab/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Language:Python7.3k 52 163505
huggingface/smol-course
A course on aligning smol models.
Language:Jupyter Notebook5.7k 42 412k
arcee-ai/mergekit
Tools for merging pretrained large language models.
Language:Python5.5k 57 364524
neo4j-labs/llm-graph-builder
Neo4j graph construction from unstructured data using LLMs
Language:Jupyter Notebook3.3k 26 566556
microsoft/PromptWizard
Task-Aware Agent-driven Prompt Optimization Framework
Language:Python3.1k 27 30259
chonkie-ai/chonkie
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Language:Python2.9k 21 58127
Azure-Samples/graphrag-accelerator
One-click deploy of a Knowledge Graph powered RAG (GraphRAG) in Azure
Language:Python2.3k 40 110397
google-gemini/generative-ai-python
The official Python library for the Google Gemini API
Language:Python2.2k 36 324448
urchade/GLiNER
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
Language:Python1.9k 18 153177
stanfordnlp/pyreft
Stanford NLP Python library for Representation Finetuning (ReFT)
Language:Python1.5k 17 108125
xhluca/bm25s
Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
Language:Python1.1k 5 4764
huggingface/optimum-quanto
A pytorch quantization backend for optimum
Language:Python912 9 15572
nomic-ai/contrastors
Train Models Contrastively in Pytorch
Language:Python688 14 4155
MinishLab/semhash
Fast Semantic Text Deduplication
Language:Python603 5 1826
Arize-ai/openinference
OpenTelemetry Instrumentation for AI Observability
Language:Python367 11 49376
amazon-science/esci-data
Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search
Language:Python278 8 1658
py-pdf/benchmarks
Benchmarking PDF libraries
Language:Python269 5 915
wjbmattingly/spacyex
SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.
Language:Python59 3 01
sbintuitions/JMTEB
The evaluation scripts of JMTEB (Japanese Massive Text Embedding Benchmark)
Language:Python53 4 2613
Azure/synthetic-qa-generation
This hands-on lab aims to alleviate some of that headache by demonstrating how to create/augment a QnA dataset from complex unstructured data, assuming a real-world scenario. The sample aims to be step-by-step for developers and data scientists, as well as those in the field, to try it out with a little help.
Language:Jupyter Notebook45 7 013
Azure/slm-innovator-lab
This lab is a 1-day/2-day end-to-end SLM workshop led and developed by AI GBB. Attendees will learn how to quickly and easily perform the data preparation-fine tuning-serving-LLMOps series of processes using Azure ML Studio and AI Studio, and will be able to expand the workload based on this.
Language:Jupyter Notebook39 5 014
japanese-law-analysis/data_set
法律・判例関係のデータセット
34 1 20
mizuumi/JDocQA
28 3 42
opensource-jp/Open-Source-AI
Japanese translation of Open Source AI Definition
22 2 00

Hironsan

Hironsan's Stars

deepseek-ai/DeepSeek-V3

VikParuchuri/marker

omnivore-app/omnivore

block/goose

allenai/olmocr

py-pdf/pypdf

opendatalab/PDF-Extract-Kit

huggingface/smol-course

arcee-ai/mergekit

neo4j-labs/llm-graph-builder

microsoft/PromptWizard

chonkie-ai/chonkie

Azure-Samples/graphrag-accelerator

google-gemini/generative-ai-python

urchade/GLiNER

stanfordnlp/pyreft

xhluca/bm25s

huggingface/optimum-quanto

nomic-ai/contrastors

MinishLab/semhash

Arize-ai/openinference

amazon-science/esci-data

py-pdf/benchmarks

wjbmattingly/spacyex

sbintuitions/JMTEB

Azure/synthetic-qa-generation

Azure/slm-innovator-lab

japanese-law-analysis/data_set

mizuumi/JDocQA

opensource-jp/Open-Source-AI