speaker_sex_attribute_privacy: A Python repository from Yamagishi and Echizen Laboratories, National Institute of Informatics - Yamagishi and Echizen Laboratories, National Institute of Informatics

This is an implementation of the papers: Hiding speaker's sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline

The authors are Paul-Gauthier Noé, Xiaoxiao Miao, Xin Wang, Junichi Yamagishi, Jean-François Bonastre, Driss Matrouf

Please cite these papers if you use this code.

This code is adapted from SSL-SAS

Generate protected speech
1. Download English development and evaluation data provided by the VoicePrivacy2020 Challenge: LibriSpeech-subsets (libri_dev and libri_test). Just run bash adapted_from_vpc/00_download_testdata.sh. The user will be requested the password, please contact VoicePrivacy2020 Challenge organizers.
2. Generate anonymized speech: bash scripts/scripts/demo_synth_protect.sh.
Train a HiFi-GAN using LibriTTS-100h on your own: bash scripts/scripts/train_hifigan.sh

Pretrained models can be found here: https://zenodo.org/record/7347685#.Y4cS0i8Rp0t

Dependencies

git clone https://github.com/nii-yamagishilab/speaker_sex_attribute_privacy.git

cd speaker_sex_attribute_privacy

bash scripts/install.sh

Make sure sox and parallel are installed.

Audio examples

original	proposed	global	TDPSOLA	synthesised but non protected*
ex1	ex1	ex1	ex1	ex1
ex2	ex2	ex2	ex2	ex2
ex3	ex3	ex3	ex3	ex3
ex4	ex4	ex4	ex4	ex4

*The speech has been fed into the proposed system but without applying the protectection (i.e. without xvector and pitch transformation)

The original audio samples are from LibriSpeech under Attribution 4.0 International (CC BY 4.0) license.

License

The adapted_from_facebookreaserch subfolder has Attribution-NonCommercial 4.0 International License. The adapted_from_speechbrain subfolder has Apache License. They were created by the facebookreasearch and speechbrain orgnization, respectively. The scripts subfolder has the MIT license.

Because this source code was adapted from the facebookresearch and speechbrain, the whole project follows
the Attribution-NonCommercial 4.0 International License.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

original	proposed	global	TDPSOLA	synthesised but non protected*
ex1	ex1	ex1	ex1	ex1
ex2	ex2	ex2	ex2	ex2
ex3	ex3	ex3	ex3	ex3
ex4	ex4	ex4	ex4	ex4

original	proposed	global	TDPSOLA	synthesised but non protected*
ex1	ex1	ex1	ex1	ex1
ex2	ex2	ex2	ex2	ex2
ex3	ex3	ex3	ex3	ex3
ex4	ex4	ex4	ex4	ex4

original	proposed	global	TDPSOLA	synthesised but non protected*
ex1	ex1	ex1	ex1	ex1
ex2	ex2	ex2	ex2	ex2
ex3	ex3	ex3	ex3	ex3
ex4	ex4	ex4	ex4	ex4