YuanGongND/ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

Jupyter NotebookBSD-3-Clause

Issues

Fine tuning AST model to Music Emotion Classification Overfit
#120 opened a year ago by moonquekes
4
question about normalization
#141 opened 4 months ago by emleeee
0
Issue while trying to execute the Speechcommands V2 Recipe
#140 opened 5 months ago by alexandramarkal
1
Install requirement issue
#138 opened 6 months ago by Sunny-gyx
3
how to reproduce the same result based on my custom dataset?
#137 opened 7 months ago by jmren168
0
How to incrementally fine-tune AST model with the new data?
#136 opened 8 months ago by jmren168
0
AttributeError: module 'numpy.typing' has no attribute 'NDArray'
#135 opened 8 months ago by nikhilsos
3
Installing requirements issues
#123 opened a year ago by BehrouzGit
1
Installing requirement and CUDA on a fresh virtual environnement
#114 opened 2 years ago by blackyx35
1
AssertionError: choose a window size 400 that is [2, 1]
#133 opened 9 months ago by GrafKnusprig
2
Discrepancy in Model Performance Using HuggingFace Pipeline Utility
#134 opened 9 months ago by penguinwang96825
5
self-contained Google Colab script error
#129 opened a year ago by moon-aver
2
some questions when reproducing your results
#131 opened 10 months ago by ben100118
2
csv error
#132 opened 10 months ago by mooncv
1
training MAP
#126 opened a year ago by maxwZJU
2
One question regarding the linear projection of AST.
#127 opened 10 months ago by poult-lab
1
Ask for help
#130 opened a year ago by Ingram-lin
1
Inquiry Regarding Audio Spectrogram Transformer
#128 opened a year ago by Ingram-lin
2
how to use my own dataset
#115 opened a year ago by wlssyuu
3
When I download the pretrained model with stride=16, I need to change `fstride` and `tstride` in the source code from 10 to 16. Besides these changes, what else do I need to adjust?
#124 opened a year ago by zky-66
0
Different audio sample size for fine-tuning the model gives overfitting issue
#125 opened a year ago by aarshilpatel
1
How can I adapt the pretrained AST model to fit my own dataset
#121 opened a year ago by zky-66
6
ESC-50-master zip file location has changed
#122 opened a year ago by BehrouzGit
2
RuntimeError: DataLoader worker (pid 39424) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.
#112 opened 2 years ago by wuhongsheng
1
After fine-tune a 3-class dataset, how to load its fine-tuned weighted to update pre-trained ast model?
#118 opened a year ago by jmren168
7
CPU memory increase while training
#119 opened a year ago by gudrb
6
seq2seq classification with AST
#117 opened a year ago by YSLCoat
2
Using pretrained model for embeddings extraction with audio input samples of different durations.
#101 opened 2 years ago by sreenivasaupadhyaya
5
AST Audioset Training Time and Hardware
#116 opened a year ago by justinluong
2
For own data
#113 opened 2 years ago by ShadowVicky
1
Huggingface-compatible ImageNet pre-trained weights
#109 opened 2 years ago by penguinwang96825
5
ERROR: Cannot install -r requirements.txt (line 10)
#111 opened 2 years ago by wuhongsheng
1
Regression Task
#110 opened 2 years ago by BaMarcy
1
pre-processing about AudioSet (resample to 16kHz)
#108 opened 2 years ago by wisekimm
13
What is the objective when pretraining?
#107 opened 2 years ago by Young973
3
Multichannel Audio Input
#106 opened 2 years ago by aaprasad
0
How to convert fbank tensor back to waveform?
#105 opened 2 years ago by ebagdasa
0
Missing "esc_class_labels_indices.csv" file
#104 opened 2 years ago by kaiw7
2
Application on ASR
#103 opened 2 years ago by chlorane
0
Epoch: [4][160156/161048] training diverged...
#102 opened 2 years ago by xiaoli1996
3
About AST for Speech Enhancement
#100 opened 2 years ago by kaiw7
5
ast input audio length
#99 opened 2 years ago by syjunghwang
0
How to use the pre-trained model on the AudioSet to extract audio features and save them as npy?
#98 opened 2 years ago by ayameyao
2
Colab notebook for inference is only partly usable with CPU
#97 opened 2 years ago by Cycerotki
3
SpeechCommands v2
#94 opened 2 years ago by yunzqq
4
Performance issues with recorded voices
#96 opened 2 years ago by milad-s5
5
In my own dataset, why is the Avg precision always 0.5 in each epoch?
#95 opened 2 years ago by 1244547821
1
How to configure the dataset or modify the code if I want to do the one class binary classification
#91 opened 2 years ago by nanyyyyyy
14
Training with Unbatch data!
#93 opened 2 years ago by fallahim
0
Dealing with different audio lengths
#92 opened 2 years ago by OhadCohen97
1