This method could convert Audio into MIDI (not target for music conversion).
There is also a KONTAKT virtual instrument ./src/generalMIDI.nki
created for this project using the samples in the source folder.
This is an optimization problem.
- Assume that the sound quality will not change when volume changes.
- There is no physical resonance of piano strings.
- Sound volume is has linear relationship with piano weight of keys.
We can do this in several steps:
- Let volume of the piano key is
v_i
at the keyi
(v_i
is positive real number). - Get the fourier feature from the sample file
f(i, omega)
(for this project is in./src/samples
folder; MS General MIDI piano sound). - Use the least square method to make sure that the fourier of target audio pieces largely close to the combination of fourier of samples (fourier has linear rule:
F(v_1 * wave_1(t) + v_2 * wave_2(t)) = v_1 * F(wave1(t)) + v_2 * F(wave2(t))
; whereF
is fourier transformation); namely the synthesized sound isSUM
ofv_i * f(i, omega)
with indexi
(wheref(i, omega) = F(wave_i(t_sample))
,omega
is the frequency), the fourier of original audio at positiont
isa(t, omega)
; then try to minimize(a(t, omega) - SUM(v_i * f(i, omega), i)) ^ 2
(quadratic polynomial: a much easier form to optimize) for allomega
(s) with linear regression method:
polynomial([v_i]) = SUM(omega) {
synth_sound(omega) = SUM(i) {v_i * f(i, omega)};
return (a(t, omega) - synth_sound(omega)) ^ 2;
};
MINIMIZE(polynomial([v_i])) // get all v_i numbers as volume for all piano keys
- For each
v_i
, if the values are too close with each other, combine them as one note onv_i
~t
domain for each keyi
. - Generate MIDI file.
Notice: in these steps, we should set the threashold for fourier
f(i, omega)
and volumev_i
to avoid calculations, since the volume is too low that we cannot hear it. In the test, I cannot notice the sound at almost10^-3
~10^-4
, so set the threas hold there.
This project seems time consuming in calculation not coding (rendering time more than coding) and it is not so useful, I am planning stop developing it.
- Only sample rate at 44100 Hz is supported (no re-sample algorithm support). If using other sample rate, please change parameters in the code at system parameter section and make sure that the sample rate at samples and audio will be the same.
- The optimization method optimized what people values. Since the human ear is sensitive to the sound volume at logarithm level, it is not theoretically correct to optimize the term mentioned above (although it is the most simple form; it is almost impossible to optimize such large functions with complicated forms when set the
log()
function in the program). In addition, the auditory mask is also a factor will affect, we will value more at some neighborhood of loud frequencies. Overall, it is somewhat reasonable to set this function considering all reasons above. - The MIDI weight parameter value and volume value is not linearly related, we need to set function to increase the accuracy of the playback from the system.
- Although this algorithm audio is optimized on MS General MIDI sound, however the windows general MIDI never run properly on fast notes like this.
Song: You Raise Me Up (calculate for 2 almost hours with 8 kernel CPU)
Original file / MIDI file / Rendered MIDI file
Voice: "one, two, three, four, five, six, seven, eight, nine, ten".
Original file / MIDI file / Rendered MIDI file
Voice: "I love daddy".