This project showcases the ability to write a long story/novel well grounded in details, feels human like and has coherence of events using LLM. Although the story is heavily grounded in details, this project doesn't aim to create a SOTA story but is just a demonstration of how to generate long stories well grounded in details using LLMs

Currently a story of 35K words / 3 chapters is already written using this Approach. The overall story is of 7 chapters and the rest 4 chapters are currently being written (High Level plot, chapters are already present).

The current plotline is What happens if the American founding fathers come back to life in 21st century. Story till now. I've reviewed this with many people who are avid story readers and most have agreed that it didn't feel AI generated, it's interesting (subjective but most have said this), grounds heavily on details. If there are any political biases, they might be of my own and request readers to avoid over emphasizing on these.

Disclaimer: The Approach used is not a fully automated solution and requires human intervention for now. But a human can create a long story with this approach within 3 weeks. And this approach grounds in details unlike the other AI generated long stories

Approach to create long story:

LLMs such as Claude 3 / Gpt 4 currently allows input context length of 150K words and can output 3K words at once. A typical novel in general has a total of 60K-100K words. Considering the 3K output limit, it isn't possible to generate a novel in one single take. So the intuition here is that let the LLM generate 1 event at a time and once the event is generated, add it to the existing story and continously repeat this process. Although theoretically this approach might seem to work, just doing this leads to LLM moving quickly from one event to another, not being very grounded in details, llm not generating event which is a continuation of the current story, LLM generating mistakes based on the current story etc.

To address this, the following steps are taken:

1. Initially fix on the high level story:

Ask LLM to generate high level plot of the story like at a 30K depth. Generate multiple plots as such. In our case, the high level line in mind was Founding Fathers returning back. Using this line, LLM was asked to generated many plots enhancing this line. It suggested many plots such as Founding fathers called back for being judged based on their actions, founding fathers called back to solve AI crisis, founding fathers come back for fighting against China, Come back and fight 2nd revolutionary war etc. Out of all these, the 2nd revolutionary war seemed the best. Post the plot, LLM was prompted to generate many stories from this plot. Out of these, multiple ideas in the stories were combined (manually) to get to fix on high level story. Once this is done, get the chapters for the high level story (again generated multiple outputs instead of 1). And generating chapters should be easy if the high level story is already present

2. Do the event based generation for events in chapter:

Once chapters are fixed, now start with the generation of events in a chapter but 1 event at a time like described above. To make sure that the event is grounded in details, a little prompting is reqd telling the LLM to avoid moving too fast into the event and ground to details, avoid generating same events as past etc. Prompt used till now (There are some repetitions in the prompt but this works well). Even after this, the output generated by LLM might not be very compelling so to get a good output, generate the output multiple times. And in general generating 5-10 outputs, results in a good possible result. And it's better to do this by varying temperatures. In case of current story, the temperature b/w 0.4-0.8 worked well. Additionally, the rationale behind generating multiple outputs is, given LLMs generate different output everytime, the chances of getting good output when prompted multiple times increases. Even after generating multiple outputs with different temperatures, if it doesn't yield good results, understand what it's doing wrong for example like avoid repeating events and tell it to avoid doing that. For example in the 3rd chapter when the LLM was asked to explain the founders about the history since their time, it was rushing off, so an instruction to explain the historic events year-by-year was added in the prompt. Sometimes the LLM also generates part of the event which is too good but the overall event is not good, in this scenario adding the part of the event to the story and continuing to generate the story worked well.

Overall Gist: Generate the event multiple times with different temperatures and take the best amongst them. If it still doesn't work, prompt it to avoid doing the wrong things it's doing

Overall Event Generation

Instead of generating the next event in a chat conversation mode, giving the whole story till now as a combination of events in a single prompt and asking it to generate next event worked better.

Conversation Type 1:



human: generate 1st event
Claude: Event1
human: generate next, 
Claude: Event2, 
human: generate next ...

Conversation Type 2: (Better)



Human:
Story till now: Event1 + Event2 + ... + EventN
Generate next event

Claude:
Event(N+1)

Also as the events are generated, one keeps getting new ideas to proceed on the story chapters. And if any event generated is so good, but aligns little different from current story, one can also change the future story/chapters.

The current approach, doesn't require any code and long stories can be generated directly using the Claude Playground or Amazon Bedrock Playground (Claude is hosted). Claude Playground has the best Claude Model Opus which Bedrock currently lacks but given this Model is 10X costly, avoided it and went with the 2nd Best Sonnet Model. As per my experience, the results on Bedrock are better than the ones in Claude Playground

Questions:

  1. Why wasn't Gpt4 used to create this story?
    • When asked Gpt4 to generate the next event in the story, there was no coherence in the next event generated with the existing story. Maybe with more prompt engineering, this might be solved but Claude 3 was giving better output without much effort so went with it. Infact, Claude 3 Sonnet (the 2nd best model from Claude) is doing much better when compared to Gpt4.
  2. How much cost did it take to do this?
    • $50-100

Further Improvements:

  1. Explore ways to avoid long input contexts. This can further reduce the cost considering most of the cost is going into this step. Possible Solutions:
    • Give gists of the events happened in the story till now instead of whole story as an input to the LLM. References: 1, 2
  2. Avoid the human loop as part of the choosing the best event generated. Currently it takes a lot of human time when choosing the best event generated. Due to this, the time to generate a story can take from few weeks to few months (1-1.5 months). If this step is automated atleast to some degree, the time to write the long story will further decrease. Possible Solutions:
    • Use an LLM to determine what are the best events or top 2-3 events generated. This can be done based on multiple factors such as whether the event is a continuation, the event is not repeating itself. And based on these factors, LLM can rate the top responses. References: Last page in this paper
    • Train a reward model (With or without LLM) for determining which generated event is better. LLM as Reward model
  3. The current approach generates only 1 story. Instead generate a Tree of possible stories for a given plot. For example, multiple generations for an event can be good, in this case, select all of them and create different stories.
  4. Use the same approach for other things such as movie story generation, Text Books, Product document generation etc
  5. Benchmark LLMs Long Context not only on RAG but also on Generation

LICENSE:

MIT