TensorFlow GPU & TPU compatible operations: MelSpectrogram, TimeFreqMask, CutMix, MixUp, ZScore, and more
For Stable version
!pip install tensorflow-extra
or For updated version
!pip install git+https://github.com/awsaf49/tensorflow_extra
To check use case of this library, checkout BirdCLEF23: Pretraining is All you Need notebook. It uses this library along with Multi Stage Transfer Learning for Bird Call Identification task.
Converts audio data to mel-spectrogram in GPU/TPU.
import tensorflow_extra as tfe
audio2spec = tfe.layers.MelSpectrogram()
spec = audio2spec(audio)
![](https://private-user-images.githubusercontent.com/36858976/271478470-45981a3f-fe32-423b-9a0d-5016b8463bbf.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg4NjUyNDAsIm5iZiI6MTcxODg2NDk0MCwicGF0aCI6Ii8zNjg1ODk3Ni8yNzE0Nzg0NzAtNDU5ODFhM2YtZmUzMi00MjNiLTlhMGQtNTAxNmI4NDYzYmJmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIwVDA2MjkwMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTc0ZTliMDZkNGU2MTk1MTE5MDkxODQyYTE0ZDhkZWU5YjM3MGMyZTM1MTE2ODc3YzRmOGZiZDY3YjVhYTU2MTcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.kC6AStMq2rrd-mu_rqXjsvl_WQi2OdjUmGhzCO8g0gk)
Can also control number of stripes.
time_freq_mask = tfe.layers.TimeFreqMask()
spec = time_freq_mask(spec)
![](https://private-user-images.githubusercontent.com/36858976/271478647-78bc7007-67e1-4a93-8f26-9d8a2e687edd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg4NjUyNDAsIm5iZiI6MTcxODg2NDk0MCwicGF0aCI6Ii8zNjg1ODk3Ni8yNzE0Nzg2NDctNzhiYzcwMDctNjdlMS00YTkzLThmMjYtOWQ4YTJlNjg3ZWRkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIwVDA2MjkwMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTU2ZTY2NzAyZGQxN2M2ZGQzZWVhZjk4MGY1NWY1ZTdlYjc2YzA2YzNjMDZkMGI4NTRmM2ZhMTlhYzI4Y2U1YTEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.oxPUwuZqlbysweRYFQu4__XtYHlsy9hSwYtoZ2S1C6k)
Can be used with audio, spec, image. For spec full freq resolution can be used using full_height=True
.
cutmix = tfe.layers.CutMix()
audio = cutmix(audio, training=True) # accepts both audio & spectrogram
![](https://private-user-images.githubusercontent.com/36858976/271479024-35af3140-46ec-4592-8923-4bd21f76cb15.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg4NjUyNDAsIm5iZiI6MTcxODg2NDk0MCwicGF0aCI6Ii8zNjg1ODk3Ni8yNzE0NzkwMjQtMzVhZjMxNDAtNDZlYy00NTkyLTg5MjMtNGJkMjFmNzZjYjE1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIwVDA2MjkwMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRmODgyMzJiYTJkMTA3ZDliYWJlOGI0OGFkYmY2MmVhYTM1NDY5MTU0ZGIxZGJmNGRhMjY2NzMxNWE4ZWQ3ZWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.pKOb5_1AvPOLaHHao3UzMPlTpd0y5u7udxrCmGMMdi4)
Can be used with audio, spec, image. For spec full freq resolution can be used using full_height=True
.
mixup = tfe.layers.MixUp()
audio = mixup(audio, training=True) # accepts both audio & spectrogram
![](https://private-user-images.githubusercontent.com/36858976/271479176-128de4aa-5295-4655-b00d-1e16b5e06560.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg4NjUyNDAsIm5iZiI6MTcxODg2NDk0MCwicGF0aCI6Ii8zNjg1ODk3Ni8yNzE0NzkxNzYtMTI4ZGU0YWEtNTI5NS00NjU1LWIwMGQtMWUxNmI1ZTA2NTYwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIwVDA2MjkwMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM4OWNjZGY2YWNjZDQyYTcyNDJhNTYwODRiOTc2NDVkZThiNjc2OTg2YjBkODdkMmZjMzgwMTZlMjRkNGNiMzEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0._HftGqu-9gE3A3Ly-uPyXzmPSf_HZjkDdbtSrxKr3g8)
Applies standardization and rescaling.
norm = tfe.layers.ZScoreMinMax()
spec = norm(spec)
![](https://private-user-images.githubusercontent.com/36858976/271478797-8a8a4b38-9eb2-4dda-ab09-11887b37c593.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg4NjUyNDAsIm5iZiI6MTcxODg2NDk0MCwicGF0aCI6Ii8zNjg1ODk3Ni8yNzE0Nzg3OTctOGE4YTRiMzgtOWViMi00ZGRhLWFiMDktMTE4ODdiMzdjNTkzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIwVDA2MjkwMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ4NzUyZDkyN2Q4NWRjMGFjMWJmYmJjM2JiMGVkYzJjZjk1NzJiYjgxZTFkODQ2MTQ0ZDI2ZTU5Y2I1OTVjZjImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.zD2h6awejBd4PdD2xOguVxKbLXO5JR9IkG_6irWHLYM)
import tensorflow as tf
import tensorflow_extra as tfe
a = tf.constant([-2.5, -1.0, 0.5, 1.0, 2.5])
b = tfe.activations.smelu(a) # array([0., 0.04166667, 0.6666667 , 1.0416666 , 2.5])