/medplexity

Evaluating LLMs for medical applications

Primary LanguageJupyter NotebookMIT LicenseMIT

Medplexity

Release Documentation Status Discord License: MIT Open in Colab

Medplexity explorer • Frontend GitHub repository • Substack

Medplexity is a python library to help with evaluation of LLMs for medical applications.

medplexity-logo

It is designed to help with the following tasks:

  • Evaluating performance of LLMs on existing medical datasets and benchmarks. E.g. MedQA, PubMedQA, etc.
  • Comparing performance of different prompts, models, and architectures.
  • Exporting results of evaluation for visualisation and further analysis.

The goal is to help answer questions like "How much better would GPT-4 perform given a vector database to load certain resources?".

🔧 Quick install

pip install medplexity

📖 Documentation

Documentation can be found here.

Example

See our "Getting Started" notebook for a full example with MedMCQA dataset.

Contributions

Contributions are welcome! Check out the todos below, and feel free to open a pull request. Remember to install pre-commit to be compliant with our standards:

pre-commit install

Feel free to raise any questions on Discord

Explorer

In addition to the library, we are also building a web app to explore the results of evaluations. The explorer is available at medplexityai.com. It's also open-sourced, see the frontend repository.

📜 License

Medplexity is licensed under the MIT License. See the LICENSE file for more details.