huggingface-datasets
There are 135 repositories under huggingface-datasets topic.
grok-ai/nn-template
Generic template to bootstrap your PyTorch project.
xlang-ai/UnifiedSKG
[EMNLP 2022] Unifying and multi-tasking structured knowledge grounding with language models
stacklok/promptwright
Generate large synthetic data using an LLM
AI-Northstar-Tech/vector-io
Comprehensive Vector Data Tooling. The universal interface for all vector database, datasets and RAG platforms. Easily export, import, backup, re-embed (using any model) or access your vector data from any vector databases or repository.
neural-maze/rick-llm
Make Llama 3.1 8B talk in Rick Sanchez’s style
autogluon/fev
Forecast evaluation library
BUAADreamer/Chinese-LLaVA-Med
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
BirkhoffG/jax-dataloader
Pytorch-like dataloaders for JAX.
vTuanpham/Large_dataset_translator
Translate large dataset to any language with google translation api and multithreads processing, no key required!
SmithaUpadhyaya/fashion_image_caption
Automate Fashion Image Captioning using BLIP-2. Automatic generating descriptions of clothes on shopping websites, which can help customers without fashion knowledge to better understand the features (attributes, style, functionality etc.) of the items and increase online sales by enticing more customers.
onesuper/HuggingFace-Datasets-Text-Quality-Analysis
Retrieves parquet files from Hugging Face, identifies and quantifies junky data, duplication, contamination, and biased content in dataset using pandas
BUAADreamer/MLLM-Finetuning-Demo
使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory
xieincz/huggingface-go
huggingface-go : 高速下载 huggingface 的模型和数据集
TirendazAcademy/Hugging-Face-Tutorials
Getting started with Hugging Face
raidionics/AeroPath
:hugs: AeroPath: An airway segmentation benchmark dataset with challenging pathology
daspartho/predict-subreddit
NLP model that predicts subreddit based on the title of a post
SapienzaNLP/ita-bench
A collection of Italian benchmarks for LLM evaluation
DSYZayn/gopeed-extension-huggingface
A gopeed-extension for downloading models and datasets from huggingface, hf-mirror and modelscope. Huggingface download
batmanscode/Talk2Book
Use AI to personify books, so that you can talk to them 🙊
mrcabbage972/simple-toolformer
A Python implementation of Toolformer using Huggingface Transformers
acrion/ditana-assistant
Ditana Assistant: AI-powered CLI/GUI tool for intelligent assistance, leveraging LLMs with OS interaction capabilities and context augmentation, optionally via Wolfram|Alpha
auniquesun/Point-Cache
[CVPR 2025] Official implementation of the paper "Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis"
shunk031/huggingface-datasets_JGLUE
JGLUE: Japanese General Language Understanding Evaluation for huggingface datasets
npuichigo/tarzan
High-level API for tar-based dataset
abhi9ab/DeepSeek-R1-Distill-Llama-8B-finance-v1
Finetuned Deepseek 8b model for finance reasoning
PRITHIVSAKTHIUR/EHRM-Models
EHRM [ Electronic Health Record Management ] introduces a centralized platform for analyzing patient records, offering insights into billing amounts, demographics, prevalent diagnoses, medical conditions, consulted doctors, admission types, and medication usage.
antoinejeannot/jurisprudence
French Jurisprudences at your fingertips @ every 72h
balnarendrasapa/road-detection
This is a course project for DSCI-6011 - Deep Learning. deals with Drivable Area and lane segmentation for self driving cars
daspartho/bored-ape-diffusion
diffusion model for unconditional image generation of Bored Apes
davidschulte/hf-dataset-selector
Find the best datasets for intermediate fine-tuning
aaaastark/Pretrain_Finetune_Transformers_Pytorch
Pre-Training and Fine-Tuning transformer models using PyTorch and the Hugging Face Transformers library. Whether you're delving into pre-training with custom datasets or fine-tuning for specific classification tasks, these notebooks offer explanations and code for implementation.
anujsahani01/English-Marathi-Translation
Fine-tuned and compared 3 🤗 pre-trained Multilingual LLMs
hearmeneigh/e621-rising-configs
Configuration files for building E621-Rising v3 SDXL model and dataset
michelecafagna26/HL-dataset
[INLG2023] The High-Level (HL) dataset is a Vision and Language (V&L) resource aligning object-centric descriptions from COCO with high-level descriptions crowdsourced along 3 axes: scene, action, rationale.
shunk031/cookiecutter-huggingface-datasets
cookiecutter for huggingface datasets
wsobanski/scraper-tvp
Scraping large amount of articles for transformer training.