Part 2 (Multiple LLMs)

Introduction¶

If you haven't already read part one of this, please check that out now for additional context.

As mentioned previously, I now had a somewhat working approach but there were some issues - I now decided to focus on improving the writing style, it was still writing like a history textbook.

Multi-LLM Approach¶

It turns out that using one LLM to do this all wasn't really working - if you used a LLM that was good at writing, then it would mess up the JSON formatting, and if one that was good at JSON was used, then the writing style was awful.

So, I thought it might be good to use multiple LLMs at different stages in the process.

For the outline, I would use LLM1. For the JSON chapter count, LLM2, and for the chapters LLM3.

So to summarize with terrible pseudocode:

LLM 1 -> generates outline
LLM 2 -> returns ChapterCount from provided outline
for each chapter in ChapterCount:
    LLM 3 -> generates chapter {chapter} based on {outline}

Model Choices¶

So after some experimentation, I discovered that Midnight Miqu 70b worked pretty well for creative writing, and thus I used a 4-bit quant for the outline generation. (I should clarify that at this point, I'm using OLLAMA and python to make this happen, due to the API simplicity).

For the JSON tasks, llama3:70b worked pretty well - it was able to do the logic and formatting tasks rather well.

And for the chapter writing, I found that Midnight Rose 70b works rather well. Miqu could also work, but I found that using the same LLM for different stages generally wasn't ideal. Doing so seemed to cause the system to fall into a 'rut' of sorts where it would focus too much on certain aspects and not at all on others.

What worked¶

Length - this stayed above 10k, so that was working well.
Plot - the plot was improved over the previous version - perhaps due to the different LLMs mixing things up.
Characters - the LLM retained it's ability to understand the characters it was writing about.
Grammar - the grammatical structure was still correct.
Sanity - the LLMs didn't really do this infinite generation as much now - changing up the models worked wonders.

What didn't work¶

Writing Style - total rubbish. The LLM would write like a history textbook rather than a story.
Pacing - sometimes it spent hundreds of words describing the color of the sky, other times it skipped over entire battle scenes with just a few words.
Word Choice - So many token phrases 'the tension was palpable' or 'as days turned to weeks'. I never want to read those ever again.
Chapter Consistency - this is a new problem, before it was only writing basically one chapter, but now since we are doing multiple generations, it tended to forget what happened in previous chapters and write totally different and disjointed things

Also not really a problem per-se but the generation time for stories isn't really fast - it's closer to about ~4h per generation on my 3x NVIDIA Tesla P40 24GB cards.

Obviously if you have better hardware, then generation will be faster.

Conclusion¶

So it seems that generating with multiple LLMs had a better result than with a single one - but there is of course still much to improve on. There are still many chapter consistency problems that need work, and it needs work on pacing and writing style. To address these, I thought it would be interesting to explore adding some revision - that is, to use a fourth LLM to write feedback for the outline and chapters.

I'll cover that in the next part of this series.