How to make contributions?
dansuh17 opened this issue · 13 comments
Are there any guidelines for contributions?
It would be very helpful if I could help contribute to implementing audio chunking / preemphasis / deemphasis filters etc. that are frequently used (from my small experience).
audio chunking
there is torch.as_strided
, however documentation is missing, but I would expect it to work the same as numpy's implementation
preemphesis/deemphasis filters
sure, that looks helpful, but maybe not on focus right now.
- I guess probably the right way is to ask here - as you did! (ref: #18) But @faroit It'd be nice to make it explicit in README.
- audio chunking: I think it's different from
torch.as_strided
because optimal audio chunking would be loading the essential part only for efficiency. That said, it'd be something we'd like to have after finishing the audio loading module. - pre-emphasis/de-emphasis: I am aware of it being used in SEGAN, but do you know if it's popular in general?
@densuh we added a contribution section in #21
audio chunking: I think it's different from torch.as_strided because optimal audio chunking would be loading the essential part only for efficiency. That said, it'd be something we'd like to have after finishing the audio loading module.
@keunwoochoi what do you mean by "essential part"? I guess @densuh meant a Framer
function to that is typically part of the stft?
I assumed chunk would be something longer - like, 2 seconds, probably one that becomes an input of a model.
- @faroit Nice! It was so fast :)
- @keunwoochoi preemphasis and deemphasis is just an example that I could come up with :) Not sure if it's popular in general for training audio data
- And yes, the chunking I meant was for something that could be used as separate entry in
torch.utiils.data.Dataset
- maybe ranging for 1 to several seconds long. I think we can use memory mapping usefully while partially loading a.npy
data, but not sure with raw audio files. File seeking can be costly.
I also have some ideas on how to do the chunking. plus the reconstruction after ISTFT (involving some overlapping windows). should we open a separate issue?
Sure, please go ahead.
* @keunwoochoi preemphasis and deemphasis is just an example that I could come up with :) Not sure if it's popular in general for training audio data
At least in ASR, preemphasis is falling out of fashion because NN AMs can deal with low energy at high frequencies better than GMM AMs
The maintainers of the Kaldi repo has started a new version that will break back compatibility and one of the reasons are a lot of audio processing that is not really used. You can look at the Kaldi help forums and developer mailing list to see some of the stuff they intend to deprecate
@dresen Hi, thanks for letting us know. Yeah, given that we're focusing on something common in many areas within audio and ones that would (hopefully) stand over time, pre-emphasis/de-emphasis wouldn't be prioritized.
Kaldi help forums and mailing list
Right, or... maybe could you help us being updated from the speech research point of view? :)
pre-emphasis/de-emphasis wouldn't be prioritized
I think we should make it easy to reproduce most research papers from most audio domains, so we may want to include this as well, but I'd leave it up to a speech expert to implement it the way it's needed.