Simple ASR model Extract features (like MFCC) from the acoustic signal Feed an ML model (classification or clustering) with the features Recognize the speech input Feature extraction which model is used? ML model which model is used?