Mihaiii's Stars
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
flipperdevices/flipperzero-firmware
Flipper Zero firmware source code
feder-cr/linkedIn_auto_jobs_applier_with_AI
LinkedIn_AIHawk is a tool that automates the jobs application process on LinkedIn. Utilizing artificial intelligence, it enables users to apply for multiple job offers in an automated and personalized way.
NexaAI/nexa-sdk
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
argmaxinc/WhisperKit
On-device Speech Recognition for Apple Silicon
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
PatrickJS/awesome-cursorrules
📄 A curated list of awesome .cursorrules files
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
comet-ml/opik
Open-source end-to-end LLM Development Platform
bytedance/GiantMIDI-Piano
zml/zml
High performance AI inference stack. Built for production. @ziglang / @openxla / MLIR / @bazelbuild
feizc/FluxMusic
Text-to-Music Generation with Rectified Flow Transformers
merveenoyan/smol-vision
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
PABannier/bark.cpp
Suno AI's Bark model in C/C++ for fast text-to-speech generation
sandrohanea/whisper.net
Whisper.net. Speech to text made simple using Whisper Models
JUSTSUJAY/nlp-zero-to-hero
NLP Zero to Hero in just 10 Kernels
RayFernando1337/MLX-Auto-Subtitled-Video-Generator
Generate accurate transcripts using Apple's MLX framework
PragmaticMachineLearning/docai
Structured information extraction from documents
njucckevin/SeeClick
The model, data and code for the visual GUI Agent SeeClick
zhangfaen/finetune-Qwen2-VL
1mrat/cursor
Repo of cursor prompts
luckyrobots/luckyrobots
We are on a mission to make robotics available to the regular software engineers, by decoupling it from ROS and physical hardware.
2U1/Phi3-Vision-Finetune
An open-source implementaion for fine-tuning Phi3-Vision and Phi3.5-Vision by Microsoft.
GaiZhenbiao/Phi3V-Finetuning
Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.
AMAAI-Lab/MidiCaps
A large-scale dataset of caption-annotated MIDI files.
microsoft/UICaption
We release the UICaption dataset. The dataset consists of UI images (icons and screenshots) and associated text descriptions. This dataset was used to pre-train the Lexi model which provides a generic representation of UI screens and their components.
showlab/GUI-Narrator
Repository of GUI Action Narrator
mihaidobrescu1111/guess_the_word