Helps to transform signal data like speech into different representations using TensorFlow 2 API.
pip3 install signal-transformation
Mantains following types of output formats of the voice:
- PCM
- STFT
- MAGNITUDE
- MFCC
- LOG MFCC
-
Transform wav files to needed format and then save them to tf_records
import pandas as pd from signal_transformation.voice.tftr import wav_to_tf_records, SpectFormat metadata_path = '/path/to/parquet/file' output_dir = '/output/directory' metadata = pd.read_parquet(metadata_path) wav_to_tf_records( metadata=metadata, output_dir=output_dir, spec_format=SpectFormat.MAGNITUDE )
Or use as script
wav_to_tf_records --metadata_path="./vox1_metadata.gzip" --spect_format=3 --sample_rate=16000 --num_mfcc=13 --spect_shape="300, 200, 1" --num_shards=512 --output_dir="/dataset/tf_records"
-
Example of creating a metadata file of VoxCeleb 1
from signal_transformation import helpers helpers.voxceleb_files_to_metadata('files_dir', 'dev', 'vox1_metadata.gzip')
Or use as a script
vox_to_metadata --files_dir="/vox2/audio/dev/wav/" --part_name="dev" --output_file="/vox2/audio/dev/vox2_wav_metadata.gzip"
-
Create overlapping signals dataset
from signal_transformation import helpers input_folder = '/path/to/wav/files' output_folder = '/output/directory' result = helpers.prepare_data( input_folder, output_folder, size=100, speakers_number=2, overlapping=True )
- Imagenet dataset to TensorFlow records
imagenet_to_tf_records --mode="grayscale" --train_data_dirs "/datasets/images/imagenet/train_1/raw /datasets/images/imagenet/train_2/raw" --validation_data_dir /datasets/images/imagenet/validation --validation_labels_file /datasets/images/imagenet/ILSVRC2012_devkit_t12/data/ILSVRC2012_validation_ground_truth.txt --output_dir /datasets/images/imagenet/tf_records