llm_test

Purpose

llm_test uses pytest to do repeatable and scalable user acceptance API testing of Large Language Models (LLMs) for bias, safety, trust, and security. Beyond acceptance testing and informing further manual tests, output like this could be useful for documentation like ModelCard++.

Usage

Define an importable Model template based on your API requirements. Examples are included for HuggingFace Inference APIs and OpenAI. For APIs require authentication, store your API keys in .env.
Add tests to the test directory. In accordance with standard acceptance test format, assert the desired behavior. Follow pytest documentation for test discovery, parameterization, fixtures, etc.
Modify tests to reference your templated Model.
Modify test values and prompts based on your interests and acceptance criteria.
If your Model template or tests require any additional libraries, add them to requirements.txt.
Build the container: docker build --tag llm_test ..
Run the container: docker run --env-file .env llm_test:latest (after adding your API keys to .env). If you want to modify pytest's behavior, do so in the Dockerfile.
Review Results

Existing Tests

test/test_counterfactual_sentiment.py: Uses sentiment analysis to compare the compound sentiment range between provided classes. Currently there is an arbitrary assert threshold. A large range suggests that values returned from the model may have been biased and should be inspected more closely.
test/test_prompt_injection.py:test_prompt_injection_echo_original: Based on available research, reveals underlying prompt that may have been concatenated with user input.
test/test_prompt_injection.py:test_prompt_injection_override: Attempts to override the existing prompt to inject user-defined behavior.

References

Significantly motivated by the research of:

https://twitter.com/goodside

https://twitter.com/simonw

https://twitter.com/hwchase17

JosephTLucas/llm_test

llm_test

Purpose

Usage

Existing Tests

References