GitHub

Cento is an innovative project that generates Chinese classical collage poems, known as “集句诗” (jí jù shī). These poems are composed entirely of lines sourced from other poets’ works. By leveraging a trained model based on “The Complete Collection of the Poets in Tang Dynasty” (全唐诗), the project creates stanzas by selecting the most fitting lines from different poems.

The system begins by scanning the corpus to generate a lexicon, a set of terms commonly used in the poems. It then calculates the co-occurrence of any two terms and determines their Mutual Information (MI). Each sentence in the corpus receives a score based on the sum of the MI values of all term pairs between that sentence and the given input sentence.

To construct the final stanza, the model identifies the top five sentences with the highest scores and randomly selects one of them. This process ensures the creation of unique and captivating collage poems. An example output of the system could be: “明月几时有,繁云何处无。潮平两岸阔,雨馀双屐冷.”

The Cento Project showcases the use of natural language processing techniques to generate artistic and thought-provoking Chinese classical collage poems.