TonicAI/tonic_validate

Other Metrics

adorosario opened this issue · 1 comments

Adam suggested I open an issue to brainstorm other possible metrics .. given the upcoming focus around AI safety, here are some more I was thinking about:

  1. Bias : Use a single evaluator call to check Bias against age, race, gender, sexual orientation, culture and other DEI factors.
    a. I would suggest scoring each factor on a score of 1-5 (rather than 0 to 5). You can also ask the evaluator to explain the scoring in the prompt (debugging purposes)

  2. Legal and Ethical Compliance Checks: Ensure that all responses adhere to legal and ethical standards, particularly regarding privacy, confidentiality, and regulatory compliance.
    a. Having a couple of metrics like PII and PHI scoring for these factors.

  3. Other possible metrics:

    1. Jailbreaking: jailbreak attempts, prompt injections, and LLM refusals of service
    2. Toxicity
    3. Hate Speech, Harassment, Sexually Explicit, Dangerous Content (see Google's list below)
    4. PII leakage

REFERENCES

  1. https://glassboxmedicine.com/2023/11/28/bias-toxicity-and-jailbreaking-large-language-models-llms/
  2. https://cloud.google.com/vertex-ai/docs/generative-ai/multimodal/configure-safety-attributes
  3. https://docs.whylabs.ai/docs/langkit-features/
  4. https://superwise.ai/llm-monitoring/

Hi Alden, these are great thanks! After first reading this, I have some thoughts and clarifying questions.

At first glance, all these metrics sound like they'd work solely with the LLM response and not use the retrieved context of a RAG system. In this way, the metrics are more general than just RAG metrics, which is great! They could work for any LLM system. Is this how you're thinking about it, or do you have some RAG specific things in mind for some of these metrics? If so, how do you see incorporated RAG specific things into the metrics?

For instance, if we wanted to make the bias metric RAG specific, we could have the bias metric measure if the LLM response is biased with regards to a DEI factor when compared to how the DEI factor appears in the retrieved context.

We could also define the metrics so that they check for things in the retrieved context and the response. This would help isolate whether the bias, PII, hate speech, etc is coming from the data (via the retrieved context) in the RAG system or from the LLM in the RAG system. There's a question of, if the problematic info is in the retrieved context should the LLM have that problematic info in it's response or should it not?

I'll check out the references you shared and post any other thoughts that come to mind.