/there-and-back

This repository contains results and code for the paper "There and Back Again: Extracting Formal Domains for Controllable Neurosymbolic Story Authoring" by Jack Kelly, Alex Calderwood, Noah Wardrip-Fruin, and Michael Mateas.

Primary LanguagePDDL

There and Back Again

Overview

This repository contains results and code for the paper "There and Back Again: Extracting Formal Domains for Controllable Neurosymbolic Story Authoring" by Jack Kelly, Alex Calderwood, Noah Wardrip-Fruin, and Michael Mateas.

Our system uses GPT-4 to extract logical story domains and problems from input stories for use in the Glaive narrative planner ("Decompose"). It then uses these logical story specifications and the subsequent Glaive plans to generate potential stories ("Compose"). See the paper for more details.

We hope this repository can add additional context and support to the discussion in the paper.

Results

To evaluate this system, we ran the Decompose and Compose pipelines on 100 short stories sampled from the TinyStories and r/WritingPrompts datasets. First, we test each story with both GPT-3.5 and GPT-4, and with and without an auto-debugging step. If the output can successfully generate a Glaive plan, we use Compose to create two stories from the output, one using the problem and domain (pd, without-plan) and another one additionally using the plan (pdp, with-plan).

This data is reproduced here:

Pipeline runs

Pipeline runs for each of the 200 total stories are available in the runs folder. The top-level directories specify the dataset and model. Each story folder contains the generated domains, problems, and plan for both the original and the auto-debug ("revised") runs. Additionally, the input story is provided, as is metadata that includes the full set of prompt messages (system, assistant, and user) used at each step of the pipeline 1.

Statistics

Additionally, we computed two sets of statistics over each set of runs. First, we computed the compile rate and plan rate. Second, we computed various descriptive stats about the generated domains and problems, e.g., the average number of predicates and the average number of actions. These are available in the stats folder.

Thematic Analysis

To get a holistic sense of how valuable this system would be in a creative tool, we additionally performed thematic analysis over a small sample of the successful results. The outputs we reviewed are reproduced in the thematic_analysis folder 1.

Code

In there_and_back, you can find the code used to run the Decompose, Compose, and Recompose steps. There's a basic CLI that can be used to run Recompose, as well as a couple of scripts that can generate data in the form we used for our analysis.

Overview

The there_and_back folder contains:

  • baseline_domains: Copies of the domains taken from the Glaive release that we use as a baseline.
  • glaive: Copy of the Glaive source, including the glaive.jar file
  • input_data: The stories sampled from TinyStories and r/WritingPrompts.
  • one_shot_examples: The one-shot examples used in each of the system's prompts
  • results: A folder to hold output generated by the scripts
  • scripts: scripts for running the system over a large set of input, as in our provided results above.
  • system_prompts: The base prompts used in each task before the one shot example and other contextual information are added
  • temp_files: A scratch directory used to store temporary Glaive outputs.
  • The source files for Compose, Decompose, and Recompose, along with miscellaneous utility files.

How to Run

Requirements

  • While we are excited to share the source code for this system, we presently rely on a proprietary model, OpenAI's GPT-4. To use the model, you'll need an OpenAI API Key with access to GPT-4.

  • The system is written primarily in Python. We've only tested using Python 3.10, but other versions should work as well.

  • We use Glaive to generate our plans, which uses the Java 7 runtime. We've included the Glaive Jar file and call it from our code, but you'll need to have Java 7 installed; we've had success using jEnv as our Java manager. Using Java 7 standalone also works.

Preparation

  • Run pip install openai to install the OpenAI python library

  • Add your OpenAI Api Key to the environment, e.g., export OPENAI_API_KEY = [KEY]

Using the Recompose CLI

The Recompose CLI takes an input file and writes the results to a provided output directory.

From the top-level directory, run:

python -m there_and_back.recompose INPUT_FILE OUTPUT_DIR

The output will be in the same format as the "runs" in the results section above.

You can use either of the dataset samples like this:

python -m there_and_back.recompose ./there_and_back/input_data/tiny_stories/story_001.txt output

Using the generation scripts

We have two scripts that generate data using the same input as the data we analyze in our paper.

To run Recompose over the TinyStories and r/WritingPrompts dataset, run:

python -m there_and_back.scripts.recompose_all

NOTE: While a single Recompose run costs on the order of cents, a batch over a large dataset like this can cost dozens of dollars.**

To run Compose over our baseline domains, run:

python -m there_and_back.scripts.compose_baseline

Downloading this repository

To download, go to this repository's "Code" tab; there, you can select the option to download a ZIP file of the repo.

Contact Us

If you have any questions, feel free to email us at jochkell [at] ucsc [dot] edu or alexcwd [at] ucsc [dot] edu.

We're happy to talk more about the paper, our results, and our implementation.

Footnotes

  1. NOTE ON CONTENT: Several of the stories we sampled from the r/WritingPrompts dataset for analysis contained graphic, disturbing, and offensive content. We regret that we did not screen them more thoroughly before we ran these experiments. While the offending stories were used in the evaluation of our system, we have decided remove the more egregious ones from this repository to mitigate harm. 2