Hironsan's Stars
deepseek-ai/DeepSeek-V3
VikParuchuri/marker
Convert PDF to markdown + JSON quickly with high accuracy
omnivore-app/omnivore
Omnivore is a complete, open source read-it-later solution for people who like reading.
block/goose
an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
allenai/olmocr
Toolkit for linearizing PDFs for LLM datasets/training
py-pdf/pypdf
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
opendatalab/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
huggingface/smol-course
A course on aligning smol models.
arcee-ai/mergekit
Tools for merging pretrained large language models.
neo4j-labs/llm-graph-builder
Neo4j graph construction from unstructured data using LLMs
microsoft/PromptWizard
Task-Aware Agent-driven Prompt Optimization Framework
chonkie-ai/chonkie
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Azure-Samples/graphrag-accelerator
One-click deploy of a Knowledge Graph powered RAG (GraphRAG) in Azure
google-gemini/generative-ai-python
The official Python library for the Google Gemini API
urchade/GLiNER
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
stanfordnlp/pyreft
Stanford NLP Python library for Representation Finetuning (ReFT)
xhluca/bm25s
Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
huggingface/optimum-quanto
A pytorch quantization backend for optimum
nomic-ai/contrastors
Train Models Contrastively in Pytorch
MinishLab/semhash
Fast Semantic Text Deduplication
Arize-ai/openinference
OpenTelemetry Instrumentation for AI Observability
amazon-science/esci-data
Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search
py-pdf/benchmarks
Benchmarking PDF libraries
wjbmattingly/spacyex
SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.
sbintuitions/JMTEB
The evaluation scripts of JMTEB (Japanese Massive Text Embedding Benchmark)
Azure/synthetic-qa-generation
This hands-on lab aims to alleviate some of that headache by demonstrating how to create/augment a QnA dataset from complex unstructured data, assuming a real-world scenario. The sample aims to be step-by-step for developers and data scientists, as well as those in the field, to try it out with a little help.
Azure/slm-innovator-lab
This lab is a 1-day/2-day end-to-end SLM workshop led and developed by AI GBB. Attendees will learn how to quickly and easily perform the data preparation-fine tuning-serving-LLMOps series of processes using Azure ML Studio and AI Studio, and will be able to expand the workload based on this.
japanese-law-analysis/data_set
法律・判例関係のデータセット
mizuumi/JDocQA
opensource-jp/Open-Source-AI
Japanese translation of Open Source AI Definition