This repository is the code for the paper
Pagliarini, S., Leblois, A., & Hinaut, X. (2021) Canary Vocal Sensorimotor Model with RNN Decoder and Low-dimensional GAN Generator. In 2021 IEEE International Conference on Development and Learning (ICDL). HTML
ABSTRACT
Songbirds, like humans, learn to imitate sounds produced by adult conspecifics. Similarly, a complete vocal learning model should be able to produce, perceive and imitate realistic sounds. We propose (1) to use a low-dimensional generator model obtained from training WaveGAN on a canary vocalizations, (2) to use a RNN-classifier to model sensory processing. In this scenario, can a simple Hebbian learning rule drive the learning of the inverse model linking the perceptual space and the motor space? First, we study how the motor latent space topology affects the learning process. We then investigate the influence of the learning rate and of the motor latent space dimension. We observe that a simple Hebbian rule is able to drive the learning of realistic sounds produced via a low-dimensional GAN.
-
Train the GAN using ld = latent space dimension (e.g., ld = 3)
-
Generate the motor space (here 16k generations): name it 'motor_ld' (e.g., 'motor_3')
-
Create annotations (analysis of the motor space): name it 'sensory_EXT_ld.pkl' (e.g., 'sensory_EXT_3.pkl')
-
[OPTIONAL]: use
pre_def
to pre-define the initial weights. This is useful if one can't run for a long time the simulations.
- exploration_dir: Motor exploration directory: directory containing the latent vectors and the corresponding generated wav files.
- sensory_dir: Sensory response feedback directory containing annotations (sensory_EXT_3.pkl) and eventually other related analysis.
- train_dir: Train directory: where the trained model (GAN) is saved.
IM_simple_classic
: implementation of classic Hebbian learning rule
motor_function_WaveGAN
: generator of WaveGAN, only generative part since the training is done a priori.
This function generates the sound starting from a vector having the same dimension of the latent space: it has to be fixed before to start the WaveGAN training. To use this function the train directory of the model needs to be available in order to get access to the checkpoint.
It returns one vocal generation: for example, if I want to associate on vector to one syllable than this function takes as input one vector (saved as pkl) and generates one syllable.
Partially taken from train_wavegan.py in https://github.com/spagliarini/low-dimensional-canary-GAN (original WaveGAN is from https://github.com/chrisdonahue/wavegan).
`sensory_response' is the implementation of the syllable classifier.
The input directory contains one or more audio files (.wav) of duration 1s.
This function creates a dictionary containing three elements: - mean: this entry strores the averaged outputs produced by the ESN, for each sample. This gives each sample a unique annotation vector; the mean output link the whole output (the whole syllable) to a 16 components vector (which is an indicator vector for the 16 classes of syllables) - raw: this entry stores the raw outputs produced by the ESN. The ESN produce one annotation vector per timestep; the raw output does the same thing but for each timestep of input audio - states: this entry stores the internal states of the ESN over time.
The output is saved in the same directory of the data.
References for the classifier:
- link to git project: https://github.com/reservoirpy/reservoirpy
- link to ICANN paper: https://github.com/neuronalX/Trouvain2020_ICANN
auditory_activation
: to compute the second layer of the sensory response function.
Possibilities:
- softmax
- max scaling
- p95 (we use thin one)
VLM
: learning model.
Main parameters (at the end of the code):
-
wavegan_latent_dim: latent space dimension
-
sampling_rate: to write the syllables (same of training/exploration data)
-
ckpt_n: At which chekpoint it has to be saved. And the first line in the checkpoint file has to be changed for model_ckpt=ckpt_n.
-
learning_rate
-
MAX_trial: max number of trials per simulation
-
ns: number of syllable to learn
-
W_min: min boundary for the weights (used int the motor activation which is piecewise linear)
-
W_max: max boundary for the weights (used int the motor activation which is piecewise linear)
-
W_option: which learning rule
-
W_seed: weights initialization
-
N_sim: number of simulations to run in a row (with the same initial weights)
-
T_names: names of the syllables
-
classifier_name: which classifier (now we use only EXT)
-
activation_motor: we now use piecewise.
python InverseModelGAN.py --option learning --output_dir OUTPUT_DIR --wavegan_latent_dim 3 --ckpt_n CKPT --MAX_trial 3001
open_pkl
: to open annotations (sensory response) files.softmax_beta
: to compute the softmax varying the parameter beta.auditory_activation_test
: to explore different types of auditory activation on the motor space.exploration_space
: representation of 1D and 3D motor space (cube, slices, etc.)VLM_test
: to test the learning with ALL the auditory activation function. Ok for a limited number of activation functions and iterations.target
: Function to select the target (ideally the syllables that activate the most the classifier).
Function plotGAN
contains several function to plot the results of the learning model.
Common important parameters:
- data_dir: where the data are
- output_dir: where to save the figures
- wavegan_latent_dim: latent space dimension
- MAX_trial: Max number of time steps (the same as the one used during training)
- ns: number of syllable to learn
- N_sim: number of simulations to run in a row (with the same initial weights)
- classifier_name: which classifier (now we use only EXT)
- learning_rate: list of the learning rates used during training
- T_names: names of the syllables
plot_auditory_activation
: Plot the results of the different auditory activation functions (results from the test function).
Additional parameters:
-
beta: parameter for the softmax function
python plotGAN.py --option activation_aud --data_dir DATA_DIR --outuput_dir OUTPUT_DIR --MAX_trial 3001
plot_sensory
: Plots of the results obtained from the leanring model (VLM function in InverseModelGAN
).
Additional parameters:
-
n_points: How many saved points (every a certain number of epochs. E.g., 300).
python plotGAN.py --option sensory --data_dir DATA_DIR --outuput_dir OUTPUT_DIR --MAX_trial 3001 --n_points 300
plot_syll
: Plot the example of a syllable at a certain time steps (example spectrograms): change the name in syllables variable (just at the beginning of the function).
python plotGAN.py --option syll --data_dir DATA_DIR --outuput_dir OUTPUT_DIR
mean_spectro
: compute and plot the mean spectrogram during plateau.
Additional parameters:
- N: Nftt spectrogram librosa
- H: Hop length spectrogram librosa
- color: colormap
Note: change the parameters (bottom of the code) to plot the correct one (it could take some time so it is better to do it one by one).
python plotGAN.py --option mean_spectro --data_dir DATA_DIR --outuput_dir OUTPUT_DIR --n_points 300
cfr_dim13
: comparison between different latent space size. To be updated when more are available. Might be general for comparisons (e.g., also between sparse/not sparse).
This function saves in the input directory.
python plotGAN.py --option cfr --data_dir DATA_DIR --MAX_trial 3001
plot_sensory_test
: Plots of the results obtained from the learning model (VLMtest function in InverseModelGAN
). This is not good for long simulations (it takes a lot of time), it's ok to run short experiments.
@inproceedings{pagliarini2021canary, title={Canary Vocal Sensorimotor Model with RNN Decoder and Low-dimensional GAN Generator}, author={Pagliarini, Silvia and Leblois, Arthur and Hinaut, Xavier}, booktitle={2021 IEEE International Conference on Development and Learning (ICDL)}, pages={1--8}, year={2021}, organization={IEEE} }