/YiVal

πŸš€ Evaluate and Evolve.πŸš€ YiVal is an open-source GenAI-Ops tool for tuning and evaluating prompts, retrieval configurations, and model parameters using customizable datasets, evaluation methods, and evolution strategies.

Primary LanguagePythonApache License 2.0Apache-2.0

πŸ§šπŸ»β€οΈ YiVal

Website Β· Producthunt Β· Documentation

⚑ Build any Generative AI application with evaluation and improvement ⚑

πŸ‘‰ Follow us: Twitter | Discord

Downloads License: MIT GitHub star chart Dependency Status Open Issues

πŸ€” What is YiVal?

YiVal is an GenAI-Ops framework that allows you to iteratively tune your Generative AI model metadata, params, prompts and retrieval configs all at once with your preferred choices of test dataset generation, evaluation algorithms and improvement strategies.

Check out our quickstart guide! β†’

πŸ“£ What's Next?

Expected Features in Sep

  • Add ROUGE and BERTScore evaluators
  • Add support to midjourney
  • Add support to LLaMA2-70B, LLaMA2-7B, Falcon-40B,
  • Support LoRA fine-tune to open source models

πŸš€ Features

πŸ”§ Experiment Mode: πŸ€– Agent Mode (Auto-prompting):
Workflow Define your AI/ML application ➑️ Define test dataset ➑️ Evaluate πŸ”„ Improve ➑️ Prompt related artifacts built βœ… Define your AI/ML application ➑️ Auto-prompting ➑️ Prompt related artifacts built βœ…
Features 🌟 Streamlined prompt development process
🌟 Support for multimedia and multimodel
🌟 Support CSV upload and GPT4 generated test data
🌟 Dashboard tracking latency, price and evaluator results
🌟 Human(RLHF) and algorithm based improvers
🌟 Service with detailed web view
🌟 Customizable evaluators and improvers
🌟 Non-code experience of Gen-AI application build
🌟 Witness your Gen-AI application born and improve with just one click

Model Support matrix

We support 100+ LLM ( gpt-4 , gpt-3.5-turbo , llama e.g.).

Different Model sources can be viewed as follow

Model llm-Evaluate Human-Evaluate Variation Generate Custom func
OpenAI βœ… βœ… βœ… βœ…
Azure βœ… βœ… βœ… βœ…
TogetherAI βœ… βœ… βœ… βœ…
Cohere βœ… βœ… βœ… βœ…
Huggingface βœ… βœ… βœ… βœ…
Anthropic βœ… βœ… βœ… βœ…
MidJourney βœ… βœ…

To support different models in custom func(e.g. Model Comparison) , follow our example

To support different models in evaluators and generators , check our config

Installation

pip install yival

Demo

Colab

Demo Supported Features Colab Link
🐯 Craft your AI story with ChatGPT and MidJourney Multi-modal support of text and images. Open In Colab
🌟 Evaluate different LLM Model Performance With Your Own Q&A Test Dataset Easy model evaluation and comparison against 100+ models, thanks to LiteLLM. It provides a benchmark of model performances tailored to your customized use case or test data. Open In Colab
πŸ”₯ Startup Company Headline Generation Bot Automate prompt evolution Open In Colab
🧳 Build Your Customized Travel Guide Bot Automate prompt generation by retrieving the most related popular prompt from the community. e.g. awesome-chatgpt-prompts Open In Colab
πŸ“– Build a Cheaper Translator: Let GPT-4 Teach Llama2 to Create an Cheaper Translator Use GPT-4-generated test data to fine-tune the translation bot of Llama2 with Replicate. 6% sacrifice in performance, 18x save in cost. Open In Colab
πŸ€–οΈ Chat with Your Favorite Characters - 澹台烬 fromγ€Šι•Ώζœˆηƒ¬ζ˜Žγ€‹ Give your character a soul with automated prompt generation and character scripts retrieval Open In Colab

Multi-model Mode

Yival has multimodal capabilities and can handle generated images in AIGC really well.

Find more information in the Animal story demo we provided.

yival run demo/configs/animal_story.yml

pic

Basic Interactive Mode

To get started with a demo for basic interactive mode of YiVal, run the following command:

yival demo --auto_prompts

Once started, navigate to the following address in your web browser:

http://127.0.0.1:8073/interactive

Click to view the screenshot

Screenshot 2023-08-17 at 10 55 31 PM

For more details on this demo, check out the Basic Interactive Mode Demo.

Question Answering with expected result evaluator

yival demo --qa_expected_results

Once started, navigate to the following address in your web browser: http://127.0.0.1:8073/

Click to view the screenshot Screenshot 2023-08-18 at 1 11 44 AM

For more details, check out the Question Answering with expected result evaluator.

Automatically generate prompts with evaluator

yival demo --basic_interactive

Once started, navigate to the following address in your web browser: http://127.0.0.1:8073/

Click to view the screenshot Screenshot 2023-08-18 at 1 11 44 AM

Contributors

🌟 YiVal welcomes your contributions! 🌟

πŸ₯³ Thanks so much to all of our amazing contributors πŸ₯³

Paper / Algorithm Implementation

Paper Author Topics YiVal Contributor Data Generator Variation Generator Evaluator Selector Evolver Config
Large Language Models Are Human-Level Prompt Engineers Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han YiVal Evolver, Auto-Prompting @Tao Feng OpenAIPromptDataGenerator OpenAIPromptVariationGenerator OpenAIPromptEvaluator, OpenAIEloEvaluator AHPSelector OpenAIPromptBasedCombinationImprover config
BERTScore: Evaluating Text Generation with BERT Tianyi Zhang, Varsha Kishore, Felix Wu YiVal Evaluator, bertscore, rouge @crazycth - - BertScoreEvaluator - - -
AlpacaEval Xuechen Li, Tianyi Zhang, Yann Dubois et. al YiVal Evaluator @Tao Feng - - AlpacaEvalEvaluator - - config
Chain of Density Griffin Adams Alexander R. Fabbri et. el Prompt Engineering @Tao Feng ChainOfDensityGenerator config