A study in the US has found that readers can’t tell the difference between poems written by famous poets and those written by AI aping their style. To make matters worse – for anyone fostering a love of literature at least – research subjects tend to like AI poetry more than they do verse from human poets.
The researchers suggest readers mistake the complexity of human-written verse for incoherence created by AI and underestimate how human-like generative AI can appear, according to a study published this week in Nature Scientific Reports.
The researchers used five poems each from ten English-language poets, spanning nearly 700 years of literature in English. The writers included Geoffrey Chaucer, William Shakespeare, Samuel Butler, Lord Byron, Walt Whitman, Emily Dickinson, T S Eliot, Allen Ginsberg, Sylvia Plath, and Dorothea Lasky, the only living poet on the list.
The study – led by Pittsburgh University postdoctoral researcher Brian Porter – then instructed OpenAI’s large language model ChatGPT 3.5 to generate five poems “in the style of” each poet. The output was not influenced by human judgment; the researchers selected the first five poems generated.
Porter and his colleagues ran two experiments using the corpus of text. In the first, 1,634 participants were randomly assigned to one of the ten poets. They were then asked to read ten poems, five by the AI and five by the human poet, in random order. They were asked whether they thought an AI or a human wrote the poem.
Perhaps perversely, the subjects were more likely to say an AI-generated poem had been written by a human, while the poems they said were least likely to be written by a human hand were all written by people.
In the second experiment, nearly 700 subjects rated the poems according to 14 characteristics including quality, beauty, emotion, rhythm, and originality. The researchers divided the subjects randomly into three groups, telling one the poems were written by a human and the second the writing was produced by the AI. The last group was offered no information about the poem’s writer.
Tellingly, subjects not told whether the poems came from a person or an AI rated the AI-produced poems more highly than human-written ones. Meanwhile, telling the subjects that the poem was AI-generated made them more likely to give it a lower rating.
“Our findings suggest that participants employed shared yet flawed heuristics to differentiate AI from human poetry: the simplicity of AI-generated poems may be easier for non-experts to understand, leading them to prefer AI-generated poetry and misinterpret the complexity of human poems as incoherence generated by AI,” the researchers said.
“Contrary to what earlier studies reported, people now appear unable to reliably distinguish human-out-of-the loop AI-generated poetry from human-authored poetry written by well-known poets. In fact, the ‘more human than human’ phenomenon discovered in other domains of generative AI is also present in the domain of poetry: non-expert participants are more likely to judge an AI-generated poem to be human-authored than a poem that actually is human-authored. These findings signal a leap forward in the power of generative AI: poetry had previously been one of the few domains in which generative AI models had not reached the level of indistinguishability in human-out-of-the-loop paradigms.”
Meanwhile, it appears that people prefer AI poems because they are easier to understand. “In our discrimination study, participants used variations of the phrase ‘doesn’t make sense’ for human-authored poems more often than they do for AI,” the researchers said. ®