- https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048
- https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-pytorch
- https://docs.nvidia.com/deeplearning/frameworks/install-pytorch-jetson-platform/index.html --> Installing PyTorch for Jetson Platform
- https://forums.developer.nvidia.com/t/manually-installing-cuda-11-0-2-on-jetson-xavier-nx-help/191909/4 --> installing cuda toolkit manually
- https://repo.download.nvidia.com/jetson/
- https://docs.nvidia.com/cuda/cuda-for-tegra-appnote/index.html#upgradable-package-for-jetson --> cuda compatible version
- https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#pre-installation-actions --> to verify if device is cuda compatible
- https://developer.nvidia.com/cuda-12-0-0-download-archive --> cuda toolkit download
- https://forums.developer.nvidia.com/t/having-problems-updating-cmake-on-xavier-nx/169265 --> install cmake
- https://onnxruntime.ai/docs/build/eps.html#nvidia-jetson-tx1tx2nanoxavier --> Install onnxruntime for Jetson Devices
- https://elinux.org/Jetson_Zoo#ONNX_Runtime --> install this for onnxruntime-gpu for python 3.8 --> this is the pytorch we have
- https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html --> CUDA EP install/usage
- https://developer.nvidia.com/cudnn-downloads --> CuDNN library install
- https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html --> to find the correct versions for onnxruntime + cuda + cudnn
- https://elenacliu-pytorch-cuda-driver.streamlit.app/ --> Checking version compatiblities
- https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-890/install-guide/index.html --> CuDNN installation
- https://pytorch.org/audio/main/build.linux.html --> how to build torchaudio from source
- pytorch/audio#658 --> install command for torchaudio on Jetson
- OS: Ubuntu 20.04.6 LTS (Focal Fossa)
- Kernel: 5.10.104-tegra
- Architecture: aarch64 (arm)
- Jetpack 5.0.2 (rev2)
- Conda 24.3.0 aarch64
- Curl 7.68.0
- Python 3.8.10
- numpy 1.21.6
- Pytorch 2.2.2, Supposed to be 1.13 [torch-1.13.0a0+410ce96a.nv22.12-cp38-cp38-linux_aarch64.whl]
- torchAudio 2.2.2
- CUDA 12.0.0
- Install the wheel from nvidia jetson page for the appropriate Jetpack version using pip install
- For these wheels we need Python 3.8 (you can activate a virtual environment to downgrade)
- Kernel version 5.10.104-tegra
- CUDA 12.0 is compatible with t186 (xavier agx) and jetpack 5.0.2
- $LD_LIBRARY_PATH is where CUDA/python related library paths are found [need to validate this statement]
- sudo apt-get update && sudo apt-get check
- If software updater prompts, update that as well
- install curl
- sudo apt-get install libcurl4=7.68.0-1ubuntu2
- sudo apt-get install curl
- Continue to clone the repo, lfs has some error, but all the files seem to be there.
- install dependencies:
sudo apt-get install python3.8-dev python3.8-venv
cd
mkdir virtual_env
/usr/bin/python3.8 -m venv ~/virtual_env/venv_with_python3.8
source ~/virtual_env/venv_with_python3.8/bin/activate
python --version
- Conda (latest) get the aarch64(arm) version [py312_24.3.0-0] https://docs.anaconda.com/free/miniconda/
- Install pytorch
export TORCH_INSTALL=./torch-1.13.0a0+410ce96a.nv22.12-cp38-cp38-linux_aarch64.whl
python3 -m pip install --upgrade pip
pip install numpy==1.21.6
pip install --no-cache $TORCH_INSTALL
- Install CUDA Toolkit
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/arm64/cuda-ubuntu2004.pinsudo
mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda-tegra-repo-ubuntu2004-12-0-local_12.0.0-1_arm64.deb
sudo dpkg -i cuda-tegra-repo-ubuntu2004-12-0-local_12.0.0-1_arm64.deb
sudo cp /var/cuda-tegra-repo-ubuntu2004-12-0-local/cuda-*-keyring.gpg /usr/share/keyrings/sudo apt-get update
sudo apt-get -y install cuda
You can verify installation with nvcc --version
4. pip install torchaudio
5. Also install libhdf5-dev [make sure the dev version is installed]
6. For having conda in custom location:
conda create --prefix /work/mydir/mypath
Package cache directory: $HOME/.conda/pkgs [default]
You can add pkgs_dir $HOME/.condarc
OR set CONDA_PKGS_DIRS environment variable
- conda list --> gives the list of libraries and their versions
- [MAYBE] typeguard==2.13.3
- cuda toolkit is installed [12.0]
- pytorch is installed
- install cmake 3.26 or greater [3.26.6]
tar -zxvf cmake-3.26.6-linux-aarch64.tar.gz
cd cmake-3.26.6-linux-aarch64/
sudo cp -rf bin/ doc/ share/ /usr/local/
sudo cp -rf man/* /usr/local/man
sync
cmake --version
- Install onnxruntime for Jetson
git clone --recursive -b rel-1.12.0 https://github.com/microsoft/onnxruntime
export PATH="/usr/local/cuda/bin:${PATH}"
export CUDACXX="/usr/local/cuda/bin/nvcc"
sudo apt install -y --no-install-recommends build-essential software-properties-common libopenblas-dev libpython3.8-dev python3-pip python3-dev python3-setuptools python3-wheel
./build.sh --config Release --update --build --parallel 2 --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu
sudo pip install build/Linux/Release/dist/onnxruntime_gpu-1.12.0-cp38-cp38-linux_aarch64.whl
OR
4. Install the onnxruntime wheel found in: https://elinux.org/Jetson_Zoo#ONNX_Runtime [python 3.8, Jetpack 5.0. onnxruntime 1.12.1]
pip install <downloaded-wheel>.whl
- Install CuDNN library [version 8.5.0] Download 8.5.0, ubuntu 20.04, arm64sbsa package from NVidia CuDNN archive: cudnn-local-repo-ubuntu2004-8.5.0.96_1.0-1_arm64.deb (You may need to login to download) Direct link: https://developer.nvidia.com/compute/cudnn/secure/8.5.0/local_installers/11.7/cudnn-local-repo-ubuntu2004-8.5.0.96_1.0-1_arm64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2004-8.5.0.96_1.0-1_arm64.deb
sudo cp /var/cudnn-local-repo-ubuntu2004-8.5.0.96/cudnn-local-0CCB36B3-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install libcudnn8
sudo apt-get install libcudnn8-dev
sudo apt-get install libcudnn8-samples
sudo apt-get install zlib1g
Verify installation by searching for "libcudnn" in /usr/lib/aarch64-linux-gnu
ALSO verify installation by: a. Install freeimage library
sudo apt-get update
sudo apt-get install -y libfreeimage-dev
b. Now run their sample code
cp -r /usr/src/cudnn_samples_v8/ $HOME
cd $HOME/cudnn_samples_v8/mnistCUDNN
make clean && make
./mnistCUDNN
Test passed!
- Install Jetpack 5.0.2 using
sdkmanager --archived-versions
sudo apt-get update && sudo apt-get check
on the device- Ensure you install SDK components as well
- Validate Cuda using sample code at /usr/local/cuda/samples/1_Utilities
- Validate CuDNN using
cp -r /usr/src/cudnn_samples_v8/ $HOME
cd $HOME/cudnn_samples_v8/mnistCUDNN
make clean && make
./mnistCUDNN
Test passed!
- Mount sd card on to /usr/local path. This is so we have enough space for everything.
lsblk (identify the sdcard partition path: let's assume /dev/mmcblk0p1 )
sudo mkdir /usr/local/sd
sudo mount /dev/mmcblk1p1 /usr/local/sd
df -H -T /usr/local/sd --> to verify the mount
Add an entry in /etc/fstab UUID=<find this out from sudo blkid> /usr/local/sd ext4 defaults,user,owner,nofail,exec 0 2
- install onnxruntime-gpu from https://elinux.org/Jetson_Zoo#ONNX_Runtime
pip install <onnxruntime-gpu-file>.whl
- Install openblas(for pyTorch), pandas, soundfile
sudo apt install libopenblas-dev
pip install pandas
pip install soundfile
- set include-system-site-packages key to true in pyvenv.cfg
- pip list --local
- PYTHONPATH variable lists where this environment's look up dirs
- ldd $(which python) —> lists the shared libraries used by python interpreter
- To include the share libs Set include-system-site-packages key to true in pyvenv.cfg
- Check sys.prefix != sys.base_prefix to ensure that paths are correct.
- Open venv/bin/activate: export PATH=$PATH:/my/custom/path; source venv/bin/activate; echo $PATH
- pip show --> shows the install location
- python -m site --user-site --> shows where site-packages are looked up for the current environment
- Open python interpreter:
import site
print(site.getsitepackages())
- dpkg -L --> lists the already installed path
- dpkg-deb -c <package.deb> --> lists the paths about to be installed
- strace -o log.txt head --> gives a detailed trace of all the c calls in the kernel
- printenv
- tar -ztf --> to find the contents of a tarball
- ldd --> gives the list of shared libaries used by the app
- /etc/ld.so.cache --> is where an app looks up for available shared objects, to load from
- /etc/ld.so.conf points to /etc/ld.so.conf.d which has all the configurations for corresponding shared libraries.
- ldconfig -p | grep <library_name>
- apt search <library_name>
- readelf -d /bin/curl or readelf -d libshared.so
- readelf -d /path/to/executable | grep -E 'RPATH|RUNPATH'
- Now, cuda, cudnn, pytorch are already installed. Now we're creating a conda environment for fastspeech HS.
- Install conda in your sd card slot by
Get the latest script from: https://docs.anaconda.com/free/miniconda/
sh ~/Downloads/Miniconda3-latest-Linux-aarch64.sh -p /home/nvidia/sd/miniConda
export PATH="/home/nvidia/sd/miniConda/bin:$PATH"
source ~/.bashrc
- Create your conda environment with just python=3.8 & pip
- Now install torch
# [Jetpack 5.0.2, Python 3.8, Torch 1.13.0]
wget https://download.pytorch.org/whl/torchaudio-0.13.0-cp38-cp38-manylinux2014_aarch64.whl
pip install torch-1.13.0a0+d0d6b1f2.nv22.10-cp38-cp38-linux_aarch64.whl
- Build & install torchaudio from source using
conda install cmake ninja
git clone https://github.com/pytorch/audio
cd audio
BUILD_SOX=1 python setup.py develop
- Create & activate either one of the two conda environments
conda env create -f env_jets.yml
conda activate tts-jets
[OR]
conda env create -f env_reqs.yml
conda activate tts-reqs
- Inference code should now work
- Now, cuda, cudnn, conda(in sdcard) are already installed. Now we're creating a conda environment for fastspeech HS.
- Fo a specific location
conda create --prefix path/to/location/<env_name>
conda install python=3.8 pip
pip install /path/to/torch-1.13.0a0+d0d6b1f2.nv22.10-cp38-cp38-linux_aarch64.whl
- Build & install torchaudio from source
conda install cmake ninja
git clone https://github.com/pytorch/audio
cd audio
BUILD_SOX=1 python setup.py develop
conda env update -f <env_name>.yml
- udevadm info /dev/mmcblk1p1 --> is a kernel level tool that polls for newly connected devices
- /etc/udev/rules.d/ or /usr/local/sd or /usr/lib/udev/rules.d or /lib/udev/rules.d is where UFS(sdcard) connection rules are specified. This takes precedence over fstab
- sudo chmod -R ugo+rw /usr/local/sd
- findmnt -o TARGET,VFS-OPTIONS,FS-OPTIONS --> provides the given mount options of your mounted locations
- The above command tells you if your mounted sd card has exec permissions. Shows as noexec if you don't. If there is exec permission, then nothing shows.So use the exec option in your fstab
- modinfo nvidia --> Useful for firmware versions
- Nvidia offers NSight System which comes bundled with CUDA Toolkit
- Run the profiler from /opt/nvidia/nsight-systems/2022.3.3/bin/nsys-ui
- Launches GUI where we can see profiling data
- To profile
sudo nsys profile -o ../nsightProfile/myout -f true -w true /home/nvidia/sd/miniConda/envs/tts-jets/bin/python inference.py --gender male --language hindi --sample_text "अरे भगवान" --output_file this_file.wav --alpha 1
- This generates a report file with path/name mentioned in -o option. This can be imported into the UI for analysis
- We can enable nsys profiler logging by renaming /opt/nvidia/nsight-systems/2022.3.3/target-linux-tegra-armv8/nvlog.config.template to nvlog.config
- This generates a log file for nsys in the folder that nsys command is called from.
- In theory the profiling should have worked. However just for our inference file, if I comment out
# from espnet2.bin.tts_inference import Text2Speech
it goes to the main function. It doesn't otherwise. - This behaviour is undefined and I'm leaving it right there. Will continue if profiling is mandatory for our task.
This repository contains a Fastspeech2 Model for 16 Indian languages (male and female both) implemented using the Hybrid Segmentation (HS) for speech synthesis. The model is capable of generating mel-spectrograms from text inputs and can be used to synthesize speech..
The Repo is large in size: We have used Git LFS due to Github's size constraint (please install latest git LFS from the link, we have provided the current one below).
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.python.sh | bash
sudo apt-get install git-lfs
git lfs install
Language model files are uploaded using git LFS. so please use:
git lfs fetch --all
git lfs pull
to get the original files in your directory.
The model for each language includes the following files:
config.yaml
: Configuration file for the Fastspeech2 Model.energy_stats.npz
: Energy statistics for normalization during synthesis.feats_stats.npz
: Features statistics for normalization during synthesis.feats_type
: Features type information.pitch_stats.npz
: Pitch statistics for normalization during synthesis.model.pth
: Pre-trained Fastspeech2 model weights.
- Install Miniconda first. Create a conda environment using the provided
environment.yml
file:
conda env create -f environment.yml
2.Activate the conda environment (check inside environment.yaml file):
conda activate tts-hs-hifigan
- Install PyTorch separately (you can install the specific version based on your requirements):
conda install pytorch cudatoolkit
pip install torchaudio
pip install numpy==1.23.0
For generating WAV files from mel-spectrograms, you can use a vocoder of your choice. One popular option is the HIFIGAN vocoder (Clone this repo and put it in the current working directory). Please refer to the documentation of the vocoder you choose for installation and usage instructions.
(We have used the HIFIGAN vocoder and have provided Vocoder tuned on Aryan and Dravidian languages)
The directory paths are Relative. ( But if needed, Make changes to text_preprocess_for_inference.py and inference.py file, Update folder/file paths wherever required.)
Please give language/gender in small cases and sample text between quotes. Adjust output speed using the alpha parameter (higher for slow voiced output and vice versa). Output argument is optional; the provide name will be used for the output file.
Use the inference file to synthesize speech from text inputs:
python inference.py --sample_text "Your input text here" --language <language> --gender <gender> --alpha <alpha> --output_file <file_name.wav OR path/to/file_name.wav>
Example:
python inference.py --sample_text "श्रीलंका और पाकिस्तान में खेला जा रहा एशिया कप अब तक का सबसे विवादित टूर्नामेंट होता जा रहा है।" --language hindi --gender male --alpha 1 --output_file male_hindi_output.wav
The file will be stored as male_hindi_output.wav
and will be inside current working directory. If --output_file argument is not given it will be stored as <language>_<gender>_output.wav
in the current working directory.
If you use this Fastspeech2 Model in your research or work, please consider citing:
“ COPYRIGHT 2023, Speech Technology Consortium,
Bhashini, MeiTY and by Hema A Murthy & S Umesh,
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING and ELECTRICAL ENGINEERING, IIT MADRAS. ALL RIGHTS RESERVED "
This work is licensed under a Creative Commons Attribution 4.0 International License.