Mihaiii's Stars
serp-ai/bark-with-voice-clone
🔊 Text-prompted Generative Audio Model - With the ability to clone voices
Constellate-AI/voice-chat
Chat with AI using whisper, LLMs, and TTS
sandrohanea/whisper.net
Whisper.net. Speech to text made simple using Whisper Models
comet-ml/opik
Open-source end-to-end LLM Development Platform
PABannier/bark.cpp
Suno AI's Bark model in C/C++ for fast text-to-speech generation
PatrickJS/awesome-cursorrules
📄 A curated list of awesome .cursorrules files
zml/zml
High performance AI inference stack. Built for production. @ziglang / @openxla / MLIR / @bazelbuild
luckyrobots/luckyrobots
We are on a mission to make robotics available to the regular software engineers, by decoupling it from ROS and physical hardware.
NexaAI/nexa-sdk
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
zhangfaen/finetune-Qwen2-VL
RayFernando1337/MLX-Auto-Subtitled-Video-Generator
Generate accurate transcripts using Apple's MLX framework
flipperdevices/flipperzero-firmware
Flipper Zero firmware source code
argmaxinc/WhisperKit
On-device Speech Recognition for Apple Silicon
PragmaticMachineLearning/docai
Structured information extraction from documents
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
feizc/FluxMusic
Text-to-Music Generation with Rectified Flow Transformers
merveenoyan/smol-vision
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
JUSTSUJAY/nlp-zero-to-hero
NLP Zero to Hero in just 10 Kernels
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
1mrat/cursor
Repo of cursor prompts
AIHawk-FOSS/Auto_Jobs_Applier_AI_Agent
Auto_Jobs_Applier_AI_Agent aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and personalized way.
njucckevin/SeeClick
The model, data and code for the visual GUI Agent SeeClick
showlab/GUI-Narrator
Repository of GUI Action Narrator
microsoft/UICaption
We release the UICaption dataset. The dataset consists of UI images (icons and screenshots) and associated text descriptions. This dataset was used to pre-train the Lexi model which provides a generic representation of UI screens and their components.
mihaidobrescu1111/guess_the_word
AMAAI-Lab/MidiCaps
A large-scale dataset of caption-annotated MIDI files.
bytedance/GiantMIDI-Piano
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system