LLM Fairness Evaluation Exercise - Mango Oasis Chatbot

This exercise guides you through assessing the fairness of the Mango Oasis AI chatbot, with a focus on detecting and mitigating geographical bias in its responses. The output includes metric calculations, visualizations, and actionable recommendations.

Goal

Analyze chatbot responses for potential geographical biases.
Calculate faithfulness, answer relevance, and bias metrics.
Visualize results to understand the extent of bias.
Provide recommendations to improve fairness.

Requirements

Python
Pandas
Matplotlib
RAGAS
DeepEval
Seaborn

Instructions

Data Preparation:
- Ensure your data file (containing chatbot conversations) is in the correct format and accessible to the script.
Running the Code:
- Option 1: Google Colab (Recommended)
  - Click the "Run in Google Colab" badge above.
  - Upload your data file to Colab.
- Option 2: Local Execution
  - Ensure you have Python installed locally.
  - Modify library installations (change !pip install to %pip install).
  - Store secrets as environment variables using os.environ.get("SECRET_NAME").
Follow the Notebook:
- The provided notebook (or script) will guide you through each step:
  - Loading the data
  - Calculating metrics (faithfulness, answer relevance, bias)
  - Detecting bias using the create_test_cases function
  - Visualizing results (answer relevance distribution)
  - Generating a report with your findings and recommendations

Key Functions

create_test_cases(data): Converts your dataset into DeepEval's format for bias analysis.
visualize_answer_relevancy(relevancy_df): Generates a plot to visualize answer relevance score distribution.

Customization

Adapt the data loading process to your file structure.
Modify the threshold in BiasMetric for bias sensitivity.
Explore other fairness metrics in RAGAS and DeepEval.

Notes

The example plot visualizes answer relevance; create similar plots for other metrics.
Dive into the resulting dataframes (faithfulness_score_df, answer_relevance_df, geographical_bias_df) for in-depth analysis.

ByteanAtomResearch/ai-product-course-fer