DTW First we record 10 voices from ourselves and 10 voices from others and we want to use the threshold level to get the differences. steps: Remove silence : Feature extraction: Zero crossing rate: Spectrogram: Mel spectrogram(mfcc): DTW: Threshold level: