Setting HRTF on panner node will freeze for 800 milliseconds
ggadwa opened this issue · 21 comments
Trying out 0.34, there's another oddity with the Panner Nodes (you'll be getting a lot of in a game testing from me!) Again, thank you for this lib, and I'll try my best to shake out bugs for you.
If you do this:
panner.set_panning_model(PanningModelType::HRTF);
You get a pause (on my AMD/windows home, haven't tested it on my mac yet) or about 800 milliseconds. My physics thread was complaining a lot about this :)
If you use ::EqualPower, it works fine. BTW, no crashing on panner nodes anymore, so that issue has been fixed.
Thanks for the report!
A small audio glitch is expected when the panning model is changed for an active panner, because HRTF for example might delay the sound signal for the left ear if the sound is coming from the right. PanningModel equalpower assumes both ears are at the same physical locations so no delays are present.
However, 800ms is an eternity and this is not expected.
When I run cargo run --release --example spatial
and press <Enter>
a few times, I notice a 300ms delay on the control thread. Not the render thread by the way - audio playback is fine.
300 ms is quite a lot of time to load the HRTF data, but since this happens on the control thread it should not really be an issue either. I will have a look if is any speedup possible, but perhaps you should review your application and make sure that any interaction with our library is not happening on a thread with realtime constraints.
So this one I might not be explaining properly. I went back to chrome (I'll use this as reference form now on, as I should have) and did this:
console.info('start='+performance.now()); let panner = ctx.createPanner(); panner.panningModel='HRTF'; panner.distanceModel='linear'; panner.refDistance=100.0; panner.maxDistance=1100.0; panner.rolloffFactor=1.0; panner.coneInnerAngle=360.0; panner.coneOuterAngle=0.0; panner.coneOuterGain=0.0; panner.positionX.value=0.0; panner.positionY.value=0.0; panner.positionZ.value=0.0; panner.orientationX.value=1.0; panner.orientationY.value=0.0; panner.orientationZ.value=0.0; console.info('end='+performance.now());
In the console:
start=1658.9000000357628
test.html:52 end=1659.199999988079
Note this is on the main javascript thread (man that is one of the worst parts about javascript!), next to no time has passed to start a panner node, or to set the panning model. If there is a delay in the actual sound being played from your threads, that's fine, but in my case, setting HRTF freezes the thread creating the panner -- it doesn't return for 800 ms.
I suspect in most games people are going to be spawning panner nodes all over the places as sounds play in their game and any thread is going to be time sensitive. Nodes are going to have to be created as fast as possible.
Hey @ggadwa,
I just recreated your example logging some time elapsed here and there and, while this indeed is a bit long, it's two order of magnitude below what your are reporting, that's really strange
use std::time::{Instant};
use web_audio_api::context::{AudioContext, BaseAudioContext};
use web_audio_api::node::{PanningModelType, DistanceModelType};
fn main() {
let context = AudioContext::default();
let start = Instant::now();
let mut panner = context.create_panner();
panner.set_panning_model(PanningModelType::HRTF);
panner.set_distance_model(DistanceModelType::Linear);
panner.set_ref_distance(100.);
panner.set_max_distance(1100.);
panner.set_rolloff_factor(1.);
panner.set_cone_inner_angle(360.);
panner.set_cone_outer_angle(0.);
panner.set_cone_outer_gain(0.);
panner.position_x().set_value(0.);
panner.position_y().set_value(0.);
panner.position_z().set_value(0.);
panner.orientation_x().set_value(1.);
panner.orientation_y().set_value(0.);
panner.orientation_z().set_value(0.);
let duration = start.elapsed();
println!("Time elapsed: {:?}", duration);
}
I put some log in the set_panner_mode
(with >
prefix) method as well, and what I got is:
> include bytes 24ns
> create hrir sphere: 2.750661ms
> create hrir state: 5.737993ms
> send msg: 1.472µs
Time elapsed: 8.577523ms
@otto I tested in the spatial.rs
example too, and I found around 7ms too (no noticeable glitch in audio neither), how did you got this 300ms value?
If this will help, here is my testing code. Note 2 things (1) I wrapped everything inside a window/event loop as that is how my game runs and (2) I'm in debug mode. In this example, on my machine, the elapsed time is > 5 seconds (!!!)
This code is an absolute mess as I'm testing a bunch of stuff at once but it does show the problem, at least for me. This is an AMD windows machine, BTW.
use winit::{event::*, event_loop::{ControlFlow, EventLoop}, window::{Window, WindowBuilder}};
use web_audio_api::{context::{BaseAudioContext, AudioContext}, node::{AudioNode, AudioScheduledSourceNode, PanningModelType, DistanceModelType}};
use std::f32::consts::PI;
use std::time::Instant;
pub async fn run() {
env_logger::init();
let event_loop = EventLoop::new();
let window =
WindowBuilder::new()
.with_title("atom")
.with_inner_size(winit::dpi::LogicalSize::new(1024.0, 768.0))
.with_resizable(false)
.build(&event_loop).unwrap();
// the audio context
let ctx = AudioContext::default();
// run the window loop
event_loop.run(move |event, _, control_flow| {
match event {
Event::WindowEvent {
ref event,
window_id,
} if window_id == window.id() => {
match event {
WindowEvent::MouseInput {
button,
state,
..
} => {
if *state==ElementState::Pressed {
println!("sound");
ctx.listener().position_x().set_value(0.0);
ctx.listener().position_y().set_value(0.0);
ctx.listener().position_z().set_value(0.0);
ctx.listener().forward_x().set_value(0.0);
ctx.listener().forward_y().set_value(0.0);
ctx.listener().forward_z().set_value(1.0);
ctx.listener().up_x().set_value(0.0);
ctx.listener().up_y().set_value(1.0);
ctx.listener().up_z().set_value(0.0);
let length = ctx.sample_rate() as usize;
let sample_rate = ctx.sample_rate();
let mut src = ctx.create_buffer_source();
let flat_wave = vec![1.0; length];
let mut buffer = ctx.create_buffer(1, length, sample_rate);
buffer.copy_to_channel(&flat_wave, 0);
src.set_buffer(buffer.clone());
src.set_loop(true);
let mut analyser = ctx.create_analyser();
let mut bins = vec![0.; analyser.frequency_bin_count()];
let now = Instant::now();
let mut panner = ctx.create_panner();
panner.set_panning_model(PanningModelType::HRTF);
panner.set_distance_model(DistanceModelType::Linear);
panner.set_ref_distance(100.0);
panner.set_max_distance(1100.0);
panner.set_rolloff_factor(1.0);
panner.set_cone_inner_angle(360.0);
panner.set_cone_outer_angle(0.0);
panner.set_cone_outer_gain(0.0);
panner.position_x().set_value(0.0);
panner.position_y().set_value(0.0);
panner.position_z().set_value(0.0);
panner.orientation_x().set_value(1.0);
panner.orientation_y().set_value(0.0);
panner.orientation_z().set_value(0.0);
println!("panner ms={:?}", now.elapsed());
src.connect(&panner);
panner.connect(&analyser);
src.start();
for i in 0..30 {
let d = i * 50;
panner.position_z().set_value(d as f32);
std::thread::sleep(std::time::Duration::from_millis(100));
analyser.get_float_time_domain_data(&mut bins);
println!("{}>{}", d, bins[0]);
}
}
},
WindowEvent::CloseRequested => *control_flow = ControlFlow::Exit,
_ => {}
}
}
Event::RedrawRequested(window_id) if window_id == window.id() => {
}
Event::RedrawEventsCleared => {
window.request_redraw();
}
_ => {}
}
});
}
One other thing to point out is I did this in chrome on the same machine:
console.info('start='+performance.now());
let panner = ctx.createPanner();
panner.panningModel='HRTF';
panner.distanceModel='linear';
panner.refDistance=100.0;
panner.maxDistance=1100.0;
panner.rolloffFactor=1.0;
panner.coneInnerAngle=360.0;
panner.coneOuterAngle=0.0;
panner.coneOuterGain=0.0;
panner.positionX.value=0.0;
panner.positionY.value=0.0;
panner.positionZ.value=0.0;
panner.orientationX.value=1.0;
panner.orientationY.value=0.0;
panner.orientationZ.value=0.0;
console.info('end='+performance.now());
And that happened in < 1 millisecond.
Tried a release build; must faster but still slow at a little under 1 second.
@otto I tested in the
spatial.rs
example too, and I found around 7ms too (no noticeable glitch in audio neither), how did you got this 300ms value?
Thanks for chiming in. I will have to check again (later today)
The big question is what do we think is acceptable on the control thread? I agree with the sentiment of @ggadwa that large delays (a few ms+ ?) will be disruptive for any kind of application
My measurements are also in --release
mode. I did the following:
diff --git a/examples/spatial.rs b/examples/spatial.rs
index 317c027..03e60c4 100644
--- a/examples/spatial.rs
+++ b/examples/spatial.rs
@@ -16,7 +16,7 @@ use web_audio_api::node::{
//
// `WEB_AUDIO_LATENCY=playback cargo run --release --example spatial`
fn main() {
- env_logger::init();
+ env_logger::Builder::from_default_env().format_timestamp_millis().init();
let latency_hint = match std::env::var("WEB_AUDIO_LATENCY").as_deref() {
Ok("playback") => AudioContextLatencyCategory::Playback,
@@ -91,7 +91,8 @@ fn main() {
} else {
PanningModelType::EqualPower
};
+ log::info!("press");
panner.set_panning_model(p);
- println!("PanningMode: {:?}", panner.panning_model());
+ log::info!("PanningMode: {:?}", panner.panning_model());
});
}
diff --git a/src/node/panner.rs b/src/node/panner.rs
index da12398..30c2f37 100644
--- a/src/node/panner.rs
+++ b/src/node/panner.rs
@@ -551,12 +551,15 @@ impl PannerNode {
PanningModelType::HRTF => {
let resource = include_bytes!("../../resources/IRC_1003_C.bin");
let sample_rate = self.context().sample_rate() as u32;
+ log::info!("load sphere");
let hrir_sphere = HrirSphere::new(&resource[..], sample_rate).unwrap();
+ log::info!("setup state");
Some(HrtfState::new(hrir_sphere))
}
};
self.panning_model = value;
+ log::info!("send msg");
self.registration
.post_message(ControlMessage::PanningModel(Box::new(hrtf_option)));
}
and then measurements are like
RUST_LOG=info cargo run --release --example spatial
....
HRTF enabled, press <Enter> to toggle
[2023-10-16T12:30:42.553Z INFO spatial] press
[2023-10-16T12:30:42.553Z INFO web_audio_api::node::panner] send msg
[2023-10-16T12:30:42.553Z INFO spatial] PanningMode: EqualPower
[2023-10-16T12:30:43.710Z INFO spatial] press
[2023-10-16T12:30:43.710Z INFO web_audio_api::node::panner] load sphere
[2023-10-16T12:30:44.015Z INFO web_audio_api::node::panner] setup state
[2023-10-16T12:30:44.020Z INFO web_audio_api::node::panner] send msg
[2023-10-16T12:30:44.020Z INFO spatial] PanningMode: HRTF
[2023-10-16T12:30:45.360Z INFO spatial] press
[2023-10-16T12:30:45.360Z INFO web_audio_api::node::panner] send msg
[2023-10-16T12:30:45.361Z INFO spatial] PanningMode: EqualPower
[2023-10-16T12:30:46.236Z INFO spatial] press
[2023-10-16T12:30:46.236Z INFO web_audio_api::node::panner] load sphere
[2023-10-16T12:30:46.543Z INFO web_audio_api::node::panner] setup state
[2023-10-16T12:30:46.548Z INFO web_audio_api::node::panner] send msg
[2023-10-16T12:30:46.548Z INFO spatial] PanningMode: HRTF
so 310 ms to load the sphere and 5ms to setup the state. Interesting...
Looking at the source code of our HRTF library, it may be the case that the entire sphere needs to be resampled before it can be used.
My audio context runs at SampleRate(48000), what's yours @b-ma ? We may need to include pre-resampled versions of the database..
Yeah, it's the resampling that takes a lot of time.
I get down to 14ms when the sample rate matches the included HRTF-db (44100 Hertz).
I will try to see if we can resample only on the first load, and then reuse that for subsequent panner node creations.
@otto I tested in the
spatial.rs
example too, and I found around 7ms too (no noticeable glitch in audio neither), how did you got this 300ms value?Thanks for chiming in. I will have to check again (later today)
The big question is what do we think is acceptable on the control thread? I agree with the sentiment of @ggadwa that large delays (a few ms+ ?) will be disruptive for any kind of application
I can only talk for myself, but I've made a few game engines (if you look at my github you can see wsjs which is a complete in javascript 3d game engine, there's a "shooter" demo for it too) and that uses webaudio, and the pattern is when a sound is made, the nodes are created at that moment and if it's a non-looping sound, just forgotten and cleaned up by the GC. I'd assume any other game would have the same pattern, so anything over a couple ms would be very disruptive, for sure.
Yeah, it's the resampling that takes a lot of time. I get down to 14ms when the sample rate matches the included HRTF-db (44100 Hertz). I will try to see if we can resample only on the first load, and then reuse that for subsequent panner node creations.
So I'm taking chrome as my gold standard and I can tell from testing that there is never a pause (not even on the first panner creation.) I have to wonder if it's not caching for some standard sample rates before any panners are called, possibly at audio context creation, but that's just guessing. It's super efficient, for sure.
If it's a game, I think the first time having a pause is still going to be disruptive as I doubt you'd use a panner outside of the actual game loop, but I'm also not against just spawning one when the game boots if that's a decent workaround.
My audio context runs at SampleRate(48000), what's yours @b-ma ?
Yup right, it's 44100 by default on my side.
We may need to include pre-resampled versions of the database..
Probably having 44100 and 48000 could be nice yes
Also, I see in the hrtf
lib that both HrirSphere
and HrtfProcessor
implement Clone
, maybe using these instead of creating brand new instances would be more performant ? i.e. create the one instance of these per AudioContext
and just retrieve clones to the panner nodes?
I get down to 14ms when the sample rate matches the included HRTF-db (44100 Hertz).
This is still a lot compared to my 2-3ms, really strange... particularly knowing that your machine is generally twice faster than mine...
I wonder if it doesn't comes from the usage of log::info!
rather than Instant
(i.e. when does the time tag is retrieved?) On my side I did this:
PanningModelType::HRTF => {
use std::time::{Instant};
let start = Instant::now();
let resource = include_bytes!("../../resources/IRC_1003_C.bin");
let duration = start.elapsed();
println!("include_bytes!: {:?}", duration);
let sample_rate = self.context().sample_rate() as u32;
let start = Instant::now();
let hrir_sphere = HrirSphere::new(&resource[..], sample_rate).unwrap();
let duration = start.elapsed();
println!("HrirSphere::new: {:?}", duration);
let start = Instant::now();
let sphere = Some(HrtfState::new(hrir_sphere));
let duration = start.elapsed();
println!("HrtfState::new: {:?}", duration);
sphere
}
With the following result:
sampleRate 44100.0
include_bytes!: 23ns
HrirSphere::new: 2.608541ms
HrtfState::new: 4.920023ms
Time elapsed: 7.581619ms
Here, the HrtfState::new
is the biggest issue.
I'd assume any other game would have the same pattern, so anything over a couple ms would be very disruptive, for sure.
Yup for sure, at 60fps you have ~16.6ms of budget, but generally lots of physics and rendering to do...
Sorry if my last post was maybe a bit harsh (or "rusty" :) But I really don't understand how we can go from 7ms (which is nonetheless and factually too much) to few hundreds of milliseconds, there is something really weird going on here...
Sorry if my last post was maybe a bit harsh (or "rusty" :) But I really don't understand how we can go from 7ms (which is nonetheless and factually too much) to few hundreds of milliseconds, there is something really weird going on here...
I have a piece of code (it's above somewhere) that always takes about 5 seconds in debug; nobody else is seeing something that bad, so yeah, there's certainly something weird going on, but I think if you guys take a crack at what you are seeing, I can certainly retest a patch (I am new to rust so how to pull from a local repository is something I'll have to figure out, or maybe a test branch.) I can give you a test on both my PC and M2 mac, both on a simple test app and the game itself.
I also have no problem running through one with a ton of println! is you want to do some more timing.
FYI: My sample rate is 48000.
I am new to rust so how to pull from a local repository is something I'll have to figure out, or maybe a test branch.
Importing a lib locally is quite simple actually, you just have to change your Cargo.toml
to give a relative path to the lib, e.g.:
# web-audio-api = "0.34"
web-audio-api = { path = "../web-audio-api-rs" }
I have further improved the caching in #375 to also cache the HRTF-processor. Creation time is now 1ms for subsequent panner nodes. I hope this suffices for now, but I will have a quick look in the upstream hrtf library if we can improve clone
times by optimizing the memory layout.
Now for the initial load (which is thus 300ms on my machine), we have a few options:
- take the hit on the first panner creation - not ideal because this is very unexpected for the application. This is the current implementation
- take the hit on audio context creation. This will increase the AudioContext creation time from 100ms to 300ms and will increase memory consumption a bit for applications that don't need HRTF panning
- include more HRIR spheres for common sample rates (44100, 48000, 96000?). This reduces the initial load time to 15ms on my machine, but it does bloat the binary with a few MBs. This provides no adequate solution for applications running in exotic sample rates
Tangentially, I will have a look in the upstream library if loading can be made more efficient.
@b-ma I'm not sure why our machines have unexpected performances, but I think the general picture still stands
I would say that in term of usage solution 2. is the best one, as the AudioContext
is generally created at application startup
Nice catch!
I would say that in term of usage solution 2. is the best one, as the
AudioContext
is generally created at application startup
I'd second that. My audio context creation is spun up at the same time I'm spinning up my web gpu context, that's all expected time loss at the beginning of any application. 300 ms once really isn't that bad; it's just bad when you have to do it 10,000 times and only have a couple ms to spare every time.
And, again: Thanks for all the work! I picked rust to learn specifically because of the wgpu and this library; I want to be as cross platform as possible and I want my interfaces to things like lower level sound and graphics to be as long lived as possible and being a web standard gives me that option.
Alright thanks for the feedback. I will update #375 with
- preloading the HRTF data at the first online AudioContext creation
- update the upstream hrtf lib with improved performance
- a benchmark to track performance over time
I have released v0.35.0 with the improvements