A corpus of public domain (mostly Victorian romantics) poetry for training (about 6k poems, 300k lines of poetry, ~2M words)
Jsonl is pretty straightforward--poem, author, title.
A (hopefully small) fraction of longer poems may be truncated or incorrectly split into smaller poems. Likewise, some Walk Whitman poems may be incorrectly truncated or split (due to his proclivity to use his own name in his poems).
The shuffled text is suitable for training a LoRa on oobabooga (as plain text file with hardcuts). Note format of expected prompt.
Poets included Percy Bysshe Shelley William Blake William Butler Yeats Edna St. Vincent Millay Robert Frost William Wordsworth Emily Dickinson Robert Browning Alfred Lord Tennyson Walt Whitman George Gordon Byron Elizabeth Barrett Browning Dylan Thomas John Keats Samuel Taylor Coleridge John Greenleaf Whittier Christina Georgina Rossetti William Carlos Williams John Donne John Clare John Milton