Not sampling high density areas

Question

Not sampling high density areas

Closed this issue 3 months ago · 9 comments

Hi all,

I was hoping to get some advice on the model below. The support for the distribution is (-π, π), but in many cases the sampler does not cover the entire range, even if the density is high. One prime example is the second parameter θU. Do you have any idea why the sampler is behaving this way? Thanks!

Code

using Pigeons
using QuantumEpisodicMemory
using Random
using Turing
using StatsPlots
Random.seed!(6522)

dist = GQEM(; 
    θG = -.12,
    θU = -1.54,
    θψO = -.71,
    θψR = -.86,
    θψU = 1.26,
)
n_trials = 10000
responses = rand(dist, n_trials)
data = (n_trials, responses)

@model function model(data)
    θG ~ VonMises(0, .1)
    θU ~ VonMises(0, .1)
    θψO ~ VonMises(0, .1)
    θψR ~ VonMises(0, .1)
    θψU ~ VonMises(0, .1)
    data ~ GQEM(; θG, θU, θψO, θψR, θψU)
end

pt = pigeons(target=TuringLogPotential(model(data)), record=[traces])

chains = Chains(pt)

plot(chains, grid = false)

Output

In the output below, I noticed that min(α) is lower than usual, but I am not sure what means.

────────────────────────────────────────────────────────────────────────────
  scans        Λ      log(Z₁/Z₀)   min(α)     mean(α)    min(αₑ)   mean(αₑ) 
────────── ────────── ────────── ────────── ────────── ────────── ──────────
        2          5  -1.42e+04          0      0.444      0.968      0.996 
        4        3.4  -3.88e+03          0      0.622      0.984      0.996 
        8       3.47       -704  6.83e-321      0.615      0.984      0.996 
       16          5       -254  8.66e-154      0.444      0.992      0.999 
       32       5.87       -101   1.57e-22      0.347          1          1 
       64       6.09      -98.5   9.81e-17      0.323          1          1 
      128       6.26      -92.6   2.01e-11      0.304          1          1 
      256       6.32      -86.4    2.2e-05      0.297      0.999          1 
      512       6.69      -86.2    0.00303      0.257          1          1 
 1.02e+03       6.92      -88.1    0.00357      0.231      0.999          1 
────────────────────────────────────────────────────────────────────────────

Version Info

Status `~/.julia/dev/sandbox/gqem/Project.toml`
  [0eb8d820] Pigeons v0.4.5
  [aeb53088] QuantumEpisodicMemory v0.2.0 `https://github.com/itsdfish/QuantumEpisodicMemory.jl#main`
  [f3b207a7] StatsPlots v0.15.7
  [fce5fe82] Turing v0.34.1
  [9a3f8284] Random

Answer 1 · 2024-10-08T22:25:28.000Z

min(alpha) is the minimum swap probability between neighbour chains. Since it is approximately zero PT does not have enough chains to do its work. I suggest trying pt = pigeons(target=TuringLogPotential(model(data)), record=[traces], n_chains = 20) or even more chains.

Answer 2 · 2024-10-09T08:13:06.000Z

Thanks for your reply. I actually tried that but forgot to report it above. When I increase the number of chains to 50, some of the modes look different, but the ranges of the distribution look similar.

Output

────────────────────────────────────────────────────────────────────────────
  scans        Λ      log(Z₁/Z₀)   min(α)     mean(α)    min(αₑ)   mean(αₑ) 
────────── ────────── ────────── ────────── ────────── ────────── ──────────
        2         19  -1.12e+04          0      0.612      0.968      0.996 
        4       8.26  -4.45e+03   1.26e-34      0.831      0.984      0.999 
        8       5.56  -4.01e+03      0.246      0.887      0.984          1 
       16       9.42      -99.3      0.416      0.808      0.988          1 
       32       9.83      -89.7      0.636      0.799      0.996          1 
       64       9.25      -89.1      0.659      0.811      0.998          1 
      128       9.58        -89      0.693      0.804      0.998          1 
      256       9.69      -89.1       0.71      0.802      0.999          1 
      512       9.38      -89.2      0.769      0.809      0.999          1 
 1.02e+03       9.63        -89      0.752      0.803      0.999          1 
────────────────────────────────────────────────────────────────────────────

The log likelihood with respect to parameters looks irregular due to periodicity in the model. However, the large negative deflection occurs well after 1.75, which is where the sampler stops.

Answer 3 · 2024-10-09T15:12:06.000Z

Thanks for the update! As a first diagnostic, let's check that the samples from the prior (or near prior) are able to move more globally. To check that, use extended_traces = true as described in https://pigeons.run/dev/output-extended/

This way we might be able to see if the problem comes from the prior or something else.

Answer 4 · 2024-10-09T16:27:54.000Z

Hi @itsdfish -- quick question: you're worried about the range of thetaU, but you showed us the LL profile for thetaG. Could you provide the one for thetaU too? Because the density plot for thetaG from Pigeons agrees very well with that likelihood profile.

Answer 5 · 2024-10-09T18:10:39.000Z

Could you provide the one for thetaU too? Because the density plot for thetaG from Pigeons agrees very well with that likelihood profile.

Sorry for the noise. The plot is correct, but I used the incorrect label. I updated the post above. Here is the code I used:

parms = ( 
    θG = -.12,
    θU = -1.54,
    θψO = -.71,
    θψR = -.86,
    θψU = 1.26,
)
θUs = range(1, 3, length = 300)
LLs = map(θU -> logpdf(GQEM(; parms..., θU), n_trials, responses), θUs)
plot(θUs, LLs, xlabel = "θU", ylabel = "LL", leg=false, grid=false)

Answer 6 · 2024-10-09T18:50:51.000Z

Thanks for the update! As a first diagnostic, let's check that the samples from the prior (or near prior) are able to move more globally. To check that, use extended_traces = true as described in

Thanks for the suggestion. It looks like some chains cover more range and others do not.

Answer 7 · 2024-10-09T20:09:59.000Z

That extended traces plot looks great, it shows a smooth transition between prior and posterior.

I made a pairs plot (code below) that seems to suggest that it is the (multivariate) geometry of the posterior that puts essentially 0 probability on the values beyond 1.75. E.g. look at the (thetaU,thetaG) pair. Notice that the periodicity pattern of the little blobs suggests that the next bump for thetaG should happen around \pm 4.5, and not before that. So it makes sense that you don't see samples beyond \pm 1.75.

The period for the blobs in the other variables is shorter so you do get to see samples around \pm 3.

And of course, the true parameters have decently high probability under the posterior.

Details

using CairoMakie
using PairPlots

pt = pigeons(
    target=TuringLogPotential(model(data)), 
    record=[traces,Pigeons.energy_ac1],
    n_chains = 20,
    n_rounds = 14,
    multithreaded = true
)
chains = Chains(pt)
p = pairplot(chains, PairPlots.Truth(true_pars))

Edit: added truth lines

Answer 8 · 2024-10-09T22:20:19.000Z

Thank you both for helping me understand the model. I did wonder whether another dimension was affecting the ranges of the marginals, but I was unsure about how to investigate that hypothesis. I was not aware of PairPlots.jl. This will come in handy. I will close this issue now because it was a misunderstanding on my part.

Answer 9 · 2024-10-09T22:27:13.000Z

You're welcome! And we definitely couldn't recommend our collaborator @sefffal 's PairPlots package enough!