siddish-reddy's Stars
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
netease-youdao/EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Ironclad/rivet
The open-source visual AI programming environment and TypeScript library
csteinmetz1/ai-audio-startups
Community list of startups working with AI in audio and music technology
declare-lab/tango
A family of diffusion models for text-to-audio generation.
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
ictnlp/StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
EmergenceAI/Agent-E
Agent driven automation starting with the web. Try it: https://www.emergence.ai/web-automation-api
PrefectHQ/ControlFlow
🦾 Take control of your AI agents
wenet-e2e/wespeaker
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
open-thought/system-2-research
System 2 Reasoning Link Collection
Neph0s/awesome-llm-role-playing-with-persona
Awesome-llm-role-playing-with-persona: a curated list of resources for large language models for role-playing with assigned personas
crusher-dev/crusher
🧙♀️ Fast low-code testing — create, run tests and get alerts ⏱️ Create test in <60 secs 👉 Better open source alternative to selenium, cypress and puppeteer
valeman/awesome-conformal-prediction
A professionally curated list of awesome Conformal Prediction videos, tutorials, books, papers, PhD and MSc theses, articles and open-source libraries.
niuzaisheng/ScreenAgent
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
lzw-lzw/GroundingGPT
[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model
NousResearch/Open-Reasoning-Tasks
A comprehensive repository of reasoning tasks for LLMs (and beyond)
ganarajpr/awesome-dspy
An Awesome list of curated DSPy resources.
OpenT2S/LlamaVoice
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
NVIDIA/audio-flamingo
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
magicproduct/hash-hop
Long context evaluation for large language models
marianne-m/brouhaha-vad
Predicts the level of noise and reverberation on your audiofiles
Sreyan88/GAMA
Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
ComposioHQ/Composio-Function-Calling-Benchmark
Function Calling Benchmark & Testing
allenai/unified-io-2.pytorch
declare-lab/LLM-PuzzleTest
This repository is maintained to release dataset and models for multimodal puzzle reasoning.
transitive-bullshit/internet-diet
Chrome extension to remove unhealthy foods from the web.
agentsea/toolfuse
A common protocol for AI agent tools
elicit/debate