wav2vec2-huggingface-sagemaker

Fine-tune and deploy Wav2Vec2 model for speech recognition with HuggingFace and SageMaker

In this repository, we use SUPERB dataset that available from Hugging Face Datasets library, and fine-tune the Wav2Vec2 model and deploy it as SageMaker endpoint for real-time inference for an ASR task.

First of all, we show how to load and preprocess the SUPERB dataset in SageMaker environment in order to obtain tokenizer and feature extractor, which are required for fine-tuning the Wav2Vec2 model. Then we use SageMaker Script Mode for training and inference steps, that allows you to define and use custom training and inference scripts and SageMaker provides supported Hugging Face framework Docker containers. For more information about training and serving Hugging Face models on SageMaker, see Use Hugging Face with Amazon SageMaker. This functionality is available through the development of Hugging Face AWS Deep Learning Container (DLC).

This notebook is tested in both SageMaker Studio and SageMaker Notebook environments. Below shows detailed setup.

  • SageMaker Studio: ml.m5.xlarge instance with Data Science kernel.
  • SageMaker Notebook: ml.m5.xlarge instance with conda_python3 kernel.

Requirements

  • sagemaker version: 2.191.0
  • transformers version: 4.34.0
  • datasets version: 2.8.0
  • s3fs version: 2023.9.2
  • torch version: 2.1.0
  • jiwer
  • soundfile

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.