saaltone/SANNet

Voice Recognition

Closed this issue · 2 comments

Hello, a week ago I discovered your project. Very interesting and good project.

I tried to make a demo for voice recognition. I've converted several voice files to a MNIST-like format. Loading data into the program is successful. After I successfully remade your demo sample music. I have a difficulty, can't make buildOutputAttentionModule for text. Could you help me a little. I also send files with demo data files.

File: https://github.com/SunnyPage/SANNet/blob/SunnyPage-Speech-1/src/demo/SpeechToSpeech.java

DataTrain: https://github.com/SunnyPage/SANNet/blob/SunnyPage-Speech-1/src/data/data_train.csv
DataTest: https://github.com/SunnyPage/SANNet/blob/SunnyPage-Speech-1/src/data/data_test.csv

Hi,

I committed new version of Music demo having separated encoder and decoder part and attention that combines both. I would recommend using scaled dot attention. I hope this works as an example what you are looking for.

Please note that if you rebase your code to latest SANNet source code MMatrix has been replaced with single 3D Matrix with rows, columns and depth.

It should be quite easy to adapt your SpeechToSpeech to latest refactored Matrix usage:

  1. Rename all places with MMatrix class type to Matrix class type
  2. Remove extra '.get(0)' in places where IDE complains about those. Matrix does not have this extra dimension any more.
  3. For DMatrix and SMatrix object creation (new DMatrix or new SMatrix) add depth dimension with ', 1'. For example: new DMatrix(rows, 1, 1)
  4. For each setValue add depth dimension ', 0'. For example: inItem.setValue(j, 0, 0, 0.1);
  5. Unwrap new Matrix(). For example: 'dataSet.add(row, new Matrix(inItem));' into 'dataSet.add(row, inItem);'

I hope this helps.

Many thanks for the help.

At the moment I am developing a voice assistant. Which is a system of three parts - a speech recognition system, text translation from one language to another, and a speech synthesis system. For this purpose, I use your SANNet.