Project Page | Video | Paper | Data
Code for the DIFFRIR model presented in Hearing Anything Anywhere. Please contact Mason Wang at masonlwang32 at gmail dot com for any inquiries or issues.
Mason Wang1 | Ryosuke Sawata1,2 | Samuel Clarke1 | Ruohan Gao1,3 | Elliott Wu1 | Jiajun Wu1
1Stanford, 2SONY AI, 3University of Maryland, College Park
HRIRs
- the SADIE dataset of Head-Related Room Impulse Responses, which are used to render binaural audio.
example_trajectories
- 3 notebooks used for generating example trajectories using trajectory.py, which are on the website. Includes a hallway, dampened room, and virtual speaker rotation example. Also contains audio files you can simulate in the room.
models
- weights for pretrained models in each of the four base subdatasets.
precomputed
- folder of precomputed reflection paths for all datasets, computed up to their default order
rooms
- information on the geometry of each room, also contains dataset.py
, which is used for loading data.
binauralize.py
- tools used for binaural rendering
config.py
- used to link the dataset
evaluate.py
- tools used to evaluate renderings and render music
metrics.py
- loss functions and evaluation metrics
render.py
- the DIFFRIR renderer, used to render RIRs.
train.py
- Training script, will train a DIFFRIR renderer on the specified dataset, save its outputs, and evaluate it.
trajectory.py
- Used for rendering trajectories, e.g., simulating walking through a room while audio is playing
The dataset can be downloaded from zenodo: https://zenodo.org/records/11195833
config.py
contains a list of paths to the data directories for different subdatasets. Each data directory should contain RIRs.npy
, xyzs.npy
, and so on.
Before using DIFFRIR, you will need to edit config.py
so that these paths point to the correct directories on your machine.
There are three example notebooks in the example_trajectories directory that show you how to generate realistic, immersive audio in a room.
The three necessary arguments to the training script train.py
are:
- The path where the model's weights and renderings should be saved.
- The name of the dataset (e.g.
"classroomBase"
) as specified inrooms/dataset.py
. - The path to the directory of pretraced reflection paths (these are included as part of this github repo), which should be
precomputed/<dataset_name>
For example, to train and evaluate DIFFRIR on the Classroom Base dataset, simply run:
python train.py models/classroomBase classroomBase precomputed/classroomBase
In the above example:
- The weights and training losses of the model will be saved in
models/classroomBase
, - In
models/classroomBase/predictions
, the predicted RIRs for the monoaural locations in the dataset, the predicted music renderings, and the predicted binaural RIRs and music for the binaural datapoints in the dataset will be saved. models/classroomBase/predictions
will contain(N,)
numpy arrays specifiying the per-datapoint error for monoaural RIR rendering.models/classroomBase/predictions
will contain(N,K)
numpy arrays specifiying the per-datapoint, per-song error for monoaural music rendering.
The precomputed directory contains traced paths for all of the subdatasets used, but in case you would like to retrace (perhaps to a different order), you can use trace.py:
python trace.py precomputed/classroomBase classroomBase
The above command will trace the classroomBase dataset to its default reflection order(s), and save the results in precomputed/classroomBase
.
@InProceedings{hearinganythinganywhere2024,
title={Hearing Anything Anywhere},
author={Mason Wang and Ryosuke Sawata and Samuel Clarke and Ruohan Gao and Shangzhe Wu and Jiajun Wu},
booktitle={CVPR},
year={2024}}