/Chinese-Lyric-Corpus

A Chinese lyric corpus which contains nearly 50,000 lyrics from 500 artists

MIT LicenseMIT

Chinese-Lyric-Corpus

This corpus is designed for Chinese lyric generation task, which is one of the most popular NLP tasks nowadays.

A Chinese lyric corpus which contains nearly 50k lyrics from 500 artists, crawled from NetEase Cloud Music.

The timelines and most of the staff information have been clean. This is a sample of the lyric data:

你住的  巷子里  我租了一间公寓
为了想与你不期而遇
高中三年  我为什么  为什么不好好读书
没考上跟你一样的大学
我找了份工作  离你宿舍很近
当我开始学会做蛋饼  才发现你  不吃早餐
喔  你又擦肩而过
你耳机听什么  能不能告诉我

躺在你学校的操场看星空
教室里的灯还亮着你没走
记得  我写给你的情书
都什么年代了
到现在我还在写着
...