Medplexity explorer • Frontend GitHub repository • Substack
Medplexity is a python library to help with evaluation of LLMs for medical applications.
It is designed to help with the following tasks:
- Evaluating performance of LLMs on existing medical datasets and benchmarks. E.g. MedQA, PubMedQA, etc.
- Comparing performance of different prompts, models, and architectures.
- Exporting results of evaluation for visualisation and further analysis.
The goal is to help answer questions like "How much better would GPT-4 perform given a vector database to load certain resources?".
pip install medplexity
Documentation can be found here.
See our "Getting Started" notebook for a full example with MedMCQA dataset.
Contributions are welcome! Check out the todos below, and feel free to open a pull request.
Remember to install pre-commit
to be compliant with our standards:
pre-commit install
Feel free to raise any questions on Discord
In addition to the library, we are also building a web app to explore the results of evaluations. The explorer is available at medplexityai.com. It's also open-sourced, see the frontend repository.
Medplexity is licensed under the MIT License. See the LICENSE file for more details.