guqinMM: A repository from WDS Research Group @ Southeast University, Nanjing, China - WDS Research Group @ Southeast University, Nanjing, China

Jianzipu MM

Ancient Chinese Qing Score Jianzipu(JZP)
View Demo · Report Bug · Request Feature

About The Project

Guqin 古琴

This is a traditional Chinese musical instrument Guqin. A distinctive system known as Jianzipu (JZP), which utilizes reduced Chinese character notation, is employed to record guqin music. Our project aims to use the latest artificial intelligent technology to read the JZP and play the proper music from JZP.

(back to top)

Getting Started

We have two system for Jiazipu. The first one is from Jianzipu to the Jianzi document and a system to generate the music from our Jianzi document.

Prerequisites

We mainly use python and here we list some baseline and python with a gpu enviroment.

python 3.10
pytorch

pip install -r requirements.txt

Baseline

Jianzipu OCR

JZP notation: We use 五声琴谱 for our ocr dataset. The dataset made by Suzi AI. The SuziAI is a tool for notation. Please follow the gui-tool tutorial to make sure you can do the notation work.
JZP recognition: The JZP recognition model is trained with folloing method. In order to know which one is better, we need to evaluate the folloing method on our data.

Basic NLP Method: Transformer series model.

Gui-tools for JZP tutorial

To strart the annotation tool, first switch the composer button to jianzipu button and use open to open a jianzipu image folder.
Press Auto-Segmentation button to get the annotation boxes from picture. And press Music(ind.) to annotate the Jianzi Character.
Follow the video to annotate the JZP character. （images/tutorial.mp4）

5. After finished annotation, we will soon developed a json-string tool to get the description of Jianzi character. The tool is in developing, to be continued...

Guqin music generation

Guqin music generation system is aim to generate music from JZP document. Since the JZP notation doesn't include some music features, we need train our model with both document and music. The datasets includes two parts: the music parts are collected from Guqin exam videos and online video resources, the sequence parts are collected from our JZP OCR parts.

basic sound generation model: VAE, VQVAE, Diffusion, Sound Stream...
Symbolic Music Generation: Muzic
Music generation from text: MusicLM

Todo List

Construct dataset: Our dataset need both JZP images and JZP representation list. The images comes from the Wushen scores and JZP representation comes from the gui-tool
JZP recognition model train: This model includes two parts, the JZP character sequence generation and JZP document generation.
Guqin music model train.
A system of our work and a novel evalutaion system for our work. The evalutaion system need consider both human side and computer science side.

Contributing

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!