Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
Jupyter NotebookBSD-3-Clause
- 4
- 0
question about normalization
#141 opened by emleeee - 1
- 3
Install requirement issue
#138 opened by Sunny-gyx - 0
- 0
- 3
- 1
Installing requirements issues
#123 opened by BehrouzGit - 1
- 2
- 5
- 2
self-contained Google Colab script error
#129 opened by moon-aver - 2
some questions when reproducing your results
#131 opened by ben100118 - 1
- 2
training MAP
#126 opened by maxwZJU - 1
One question regarding the linear projection of AST.
#127 opened by poult-lab - 1
Ask for help
#130 opened by Ingram-lin - 2
Inquiry Regarding Audio Spectrogram Transformer
#128 opened by Ingram-lin - 3
how to use my own dataset
#115 opened by wlssyuu - 0
When I download the pretrained model with stride=16, I need to change `fstride` and `tstride` in the source code from 10 to 16. Besides these changes, what else do I need to adjust?
#124 opened by zky-66 - 1
Different audio sample size for fine-tuning the model gives overfitting issue
#125 opened by aarshilpatel - 6
- 2
ESC-50-master zip file location has changed
#122 opened by BehrouzGit - 1
RuntimeError: DataLoader worker (pid 39424) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.
#112 opened by wuhongsheng - 7
After fine-tune a 3-class dataset, how to load its fine-tuned weighted to update pre-trained ast model?
#118 opened by jmren168 - 6
CPU memory increase while training
#119 opened by gudrb - 2
seq2seq classification with AST
#117 opened by YSLCoat - 5
Using pretrained model for embeddings extraction with audio input samples of different durations.
#101 opened by sreenivasaupadhyaya - 2
AST Audioset Training Time and Hardware
#116 opened by justinluong - 1
For own data
#113 opened by ShadowVicky - 5
- 1
- 1
Regression Task
#110 opened by BaMarcy - 13
pre-processing about AudioSet (resample to 16kHz)
#108 opened by wisekimm - 3
What is the objective when pretraining?
#107 opened by Young973 - 0
Multichannel Audio Input
#106 opened by aaprasad - 0
How to convert fbank tensor back to waveform?
#105 opened by ebagdasa - 2
Missing "esc_class_labels_indices.csv" file
#104 opened by kaiw7 - 0
Application on ASR
#103 opened by chlorane - 3
Epoch: [4][160156/161048] training diverged...
#102 opened by xiaoli1996 - 5
About AST for Speech Enhancement
#100 opened by kaiw7 - 0
ast input audio length
#99 opened by syjunghwang - 2
How to use the pre-trained model on the AudioSet to extract audio features and save them as npy?
#98 opened by ayameyao - 3
- 4
SpeechCommands v2
#94 opened by yunzqq - 5
Performance issues with recorded voices
#96 opened by milad-s5 - 1
- 14
How to configure the dataset or modify the code if I want to do the one class binary classification
#91 opened by nanyyyyyy - 0
Training with Unbatch data!
#93 opened by fallahim - 1
Dealing with different audio lengths
#92 opened by OhadCohen97