/HumanPrompt

A framework for human-readable prompt-based method with large language models. Specially designed for researchers. (Deprecated, check out LangChain for better usage!)

Primary LanguagePython

HumanPrompt


HumanPrompt is a framework for easier human-in-the-loop design, manage, sharing, and usage of prompt and prompt methods. It is specially designed for researchers. It is still in progressπŸ‘Ά, we highly welcome new contributions on methods and modules. Check out our proposal here.

Content

To start

Firstly, clone this repo, then run:

pip install -e .

This will install humanprompt package and add soft link hub to ./humanprompt/artifacts/hub.

Then you need to set some environmental variables like OpenAI API key:

export OPENAI_API_KEY = "YOUR_OPENAI_API_KEY"

Then, it depends on how you will use this repo. For now, this repo's mission is to help researchers on verifying their ideas. Therefore, we make it really flexible to extend and use.

A minimal example to run a method is as follows:

Our usage is quite simple, it is almost similar if you have used huggingface transformers before.

For example, use the Chain-of-Thought on CommonsenseQA:

from humanprompt.methods.auto.method_auto import AutoMethod
from humanprompt.tasks.dataset_loader import DatasetLoader

# Get one built-in method
method = AutoMethod.from_config(method_name="cot")

# Get one dataset, select one example for demo
data = DatasetLoader.load_dataset(dataset_name="commonsense_qa", dataset_split="test")
data_item = data[0]

# Adapt the raw data to the method's input format, (we will improve this part later)
data_item["context"] = "Answer choices: {}".format(
        " ".join(
            [
                "({}) {}".format(label.lower(), text.lower())
                for label, text in zip(
                data_item["choices"]["label"], data_item["choices"]["text"]
            )
            ]
        )
    )

# Run the method
result = method.run(data_item)
print(result)
print(data_item)

Zero-shot text2SQL:

import os
from humanprompt.methods.auto.method_auto import AutoMethod
from humanprompt.tasks.dataset_loader import DatasetLoader

method = AutoMethod.from_config("db_text2sql")
data = DatasetLoader.load_dataset(dataset_name="spider", dataset_split="validation")
data_item = data[0]

data_item["db"] = os.path.join(
data_item["db_path"], data_item["db_id"], data_item["db_id"] + ".sqlite"
)

result = method.run(data_item)
print(result)
print(data_item)

To accelerate your research

Config

We adopt "one config, one experiment" paradigm to facilitate research, especially when benchmarking different prompting methods. In each experiment's config file(.yaml) under examples/configs/, you can config the dataset, prompting method, and metrics.

Following is a config file example for Chain-of-Thought method on GSM8K:

---
  dataset:
    dataset_name: "gsm8k"                # dataset name, aligned with huggingface dataset if loaded from it
    dataset_split: "test"                # dataset split
    dataset_subset_name: "main"          # dataset subset name, null if not used
    dataset_key_map:                     # mapping original dataset keys to humanprompt task keys to unify the interface
      question: "question"
      answer: "answer"
  method:
    method_name: "cot"                   # method name to initialize the prompting method class
    method_config_file_path: null        # method config file path, null if not used(will be overriden by method_args).
    method_args:
      client_name: "openai"              # LLM API client name, adopted from github.com/HazyResearch/manifest
      transform: "cot.gsm8k.transform_cot_gsm8k.CoTGSM8KTransform"  # user-defined transform class to build the prompts
      extract: "cot.gsm8k.extract_cot_gsm8k.CoTGSM8KExtract"        # user-defined extract class to extract the answers from output
      extraction_regex: ".*The answer is (.*).\n?"                  # user-defined regex to extract the answer from output
      prompt_file_path: "cot/gsm8k/prompt.txt"                      # prompt file path
      max_tokens: 512                    # max generated tokens
      temperature: 0                     # temperature for generated tokens
      engine: code-davinci-002           # LLM engine
      stop_sequence: "\n\n"              # stop sequence for generation
  metrics:
    - "exact_match"                      # metrics to evaluate the results

Users can create the transform and extract classes to customize the prompt generation and answer extraction process. Prompt file can be replaced or specified according to the user's need.

Run experiment

To run experiments, you can specify the experiment name and other meta configs in command line under examples/ directory.

For example, run the following command to run Chain-of-Thought on GSM8K:

python run_experiment.py
  --exp_name cot-gsm8k
  --num_test_samples 300

For new combination of methods and tasks, you can simply add a new config file under examples/configs/ and run the command.

Architecture

.
β”œβ”€β”€ examples
β”‚   β”œβ”€β”€ configs                    # config files for experiments
β”‚   β”œβ”€β”€ main.py                    # one sample demo script
β”‚   └── run_experiment.py          # experiment script
β”œβ”€β”€ hub                            # hub contains static files for methods and tasks
β”‚   β”œβ”€β”€ cot                        # method Chain-of-Thought
β”‚   β”‚   β”œβ”€β”€ gsm8k                  # task GSM8K, containing prompt file and transform/extract classes, etc.
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ ama_prompting              # method Ask Me Anything
β”‚   β”œβ”€β”€ binder                     # method Binder
β”‚   β”œβ”€β”€ db_text2sql                # method text2sql
β”‚   β”œβ”€β”€ react                      # method ReAct
β”‚   β”œβ”€β”€ standard                   # method standard prompting
β”‚   └── zero_shot_cot              # method zero-shot Chain-of-Thought
β”œβ”€β”€ humanprompt                    # humanprompt package, containing building blocks for the complete prompting pipeline
β”‚   β”œβ”€β”€ artifacts
β”‚   β”‚   β”œβ”€β”€ artifact.py
β”‚   β”‚   └── hub
β”‚   β”œβ”€β”€ components                 # key components for the prompting pipeline
β”‚   β”‚   β”œβ”€β”€ aggregate              # aggregate classes to aggregate the answers
β”‚   β”‚   β”œβ”€β”€ extract                # extract classes to extract the answers from output
β”‚   β”‚   β”œβ”€β”€ post_hoc.py            # post-hoc processing
β”‚   β”‚   β”œβ”€β”€ prompt.py              # prompt classes to build the prompts
β”‚   β”‚   β”œβ”€β”€ retrieve               # retrieve classes to retrieve in-context examples
β”‚   β”‚   └── transform              # transform classes to transform the raw data to the method's input format
β”‚   β”œβ”€β”€ evaluators                 # evaluators
β”‚   β”‚   └── evaluator.py           # evaluator class to evaluate the dataset results
β”‚   β”œβ”€β”€ methods                    # prompting methods, usually one method is related to one paper
β”‚   β”‚   β”œβ”€β”€ ama_prompting          # Ask Me Anything(https://arxiv.org/pdf/2210.02441.pdf)
β”‚   β”‚   β”œβ”€β”€ binder                 # Binder(https://arxiv.org/pdf/2210.02875.pdf)
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ tasks                      # dataset loading and preprocessing
β”‚   β”‚   β”œβ”€β”€ add_sub.py             # AddSub dataset
β”‚   β”‚   β”œβ”€β”€ wikitq.py              # WikiTableQuestions dataset
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ third_party                # third party packages
β”‚   └── utils                      # utils
β”‚       β”œβ”€β”€ config_utils.py
β”‚       └── integrations.py
└── tests                          # test scripts
    β”œβ”€β”€ conftest.py
    β”œβ”€β”€ test_datasetloader.py
    └── test_method.py

Contributing

This repository is designed for researchers to give a quick usages and easy manipulation of different prompt methods. We spent a lot of time on making it easy to extend and use, thus we hope you can contribute to this repo.

If you are interested in contributing your method into this framework, you can:

  1. Bring up an issue about your required method, and we will add it into our TODO list and implement as soon as possible.
  2. Add your method into humanprompt/methods folder yourself. To do that, you should follow the following steps:
    1. Clone the repo.
    2. Create a branch from main branch, named you methods.
    3. Commit your code into your branch, you need to:
      1. add code in ./humanprompt/methods, and add your method into ./humanprompt/methods/your_method_name folder,
      2. create a hub of your method in ./hub/your_method_name,
      3. make sure to have an ./examples folder in ./hub/your_method_name to config the basic usage this method,
      4. a minimal demo in ./examples for running and testing your method.
    4. Create a demo of usage in ./examples folder.
    5. Require a PR to merge your branch into main branch.
    6. We will handle the last few steps for you to make sure your method is well integrated into this framework.

Pre-commit

We use pre-commit to control the quality of code. Before you commit, make sure to run the code below to go over your code and fix the issues.

pip install pre-commit
pre-commit install # install all hooks
pre-commit run --all-files # trigger all hooks

You can use git commit --no-verify to skip and allow us to handle that later on.

Used by

Citation

If you find this repo useful, please cite our project and manifest:

@software{humanprompt,
  author = {Tianbao Xie and
            Zhoujun Cheng and
            Yiheng Xu and
            Peng Shi and
            Tao Yu},
  title = {A framework for human-readable prompt-based method with large language models},
  howpublished = {\url{https://github.com/hkunlp/humanprompt}},
  year = 2022,
  month = October
}
@misc{orr2022manifest,
  author = {Orr, Laurel},
  title = {Manifest},
  year = {2022},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/HazyResearch/manifest}},
}