keunwoochoi/torchaudio-contrib

How to make contributions?

dansuh17 opened this issue · 13 comments

Are there any guidelines for contributions?
It would be very helpful if I could help contribute to implementing audio chunking / preemphasis / deemphasis filters etc. that are frequently used (from my small experience).

audio chunking

there is torch.as_strided, however documentation is missing, but I would expect it to work the same as numpy's implementation

preemphesis/deemphasis filters

sure, that looks helpful, but maybe not on focus right now.

  • I guess probably the right way is to ask here - as you did! (ref: #18) But @faroit It'd be nice to make it explicit in README.
  • audio chunking: I think it's different from torch.as_strided because optimal audio chunking would be loading the essential part only for efficiency. That said, it'd be something we'd like to have after finishing the audio loading module.
  • pre-emphasis/de-emphasis: I am aware of it being used in SEGAN, but do you know if it's popular in general?

@densuh we added a contribution section in #21

audio chunking: I think it's different from torch.as_strided because optimal audio chunking would be loading the essential part only for efficiency. That said, it'd be something we'd like to have after finishing the audio loading module.

@keunwoochoi what do you mean by "essential part"? I guess @densuh meant a Framer function to that is typically part of the stft?

I assumed chunk would be something longer - like, 2 seconds, probably one that becomes an input of a model.

  • @faroit Nice! It was so fast :)
  • @keunwoochoi preemphasis and deemphasis is just an example that I could come up with :) Not sure if it's popular in general for training audio data
  • And yes, the chunking I meant was for something that could be used as separate entry in torch.utiils.data.Dataset - maybe ranging for 1 to several seconds long. I think we can use memory mapping usefully while partially loading a .npy data, but not sure with raw audio files. File seeking can be costly.

I also have some ideas on how to do the chunking. plus the reconstruction after ISTFT (involving some overlapping windows). should we open a separate issue?

Sure, please go ahead.

* @keunwoochoi preemphasis and deemphasis is just an example that I could come up with :) Not sure if it's popular in general for training audio data

At least in ASR, preemphasis is falling out of fashion because NN AMs can deal with low energy at high frequencies better than GMM AMs

The maintainers of the Kaldi repo has started a new version that will break back compatibility and one of the reasons are a lot of audio processing that is not really used. You can look at the Kaldi help forums and developer mailing list to see some of the stuff they intend to deprecate

@dresen Hi, thanks for letting us know. Yeah, given that we're focusing on something common in many areas within audio and ones that would (hopefully) stand over time, pre-emphasis/de-emphasis wouldn't be prioritized.

Kaldi help forums and mailing list

Right, or... maybe could you help us being updated from the speech research point of view? :)

f0k commented

pre-emphasis/de-emphasis wouldn't be prioritized

I think we should make it easy to reproduce most research papers from most audio domains, so we may want to include this as well, but I'd leave it up to a speech expert to implement it the way it's needed.

@densuh can this be closed?