sampling problem not indicated by all diagnostics i found in the manual

Question

sampling problem not indicated by all diagnostics i found in the manual

Opened this issue a year ago · 11 comments

I'm new to Pigeons.jl, and a simple test target distribution I wrote to fimiliarize myself with the sampler doesnt seem to return correct results (returned samples dont match the known shape of the test distribution).
I've read through all the documentation pages about sampler diagnostics (that I could find) and as far as I can see all the diagnostics indicate that the sampler worked correctly.
I noticed that while there is a page on pt-diagnostics, I could not find any mention of a way to access diagnostics of the explorers (I'm assuming they have diagnostics that need checking as well).

The sampler was run at default settings.

The test target distribution is 2-dimensional. It's a mixture of 2 1d-gaussians in both dimensions, resulting in 4 2d-gaussian modes of varying sizes and shapes. All 4 modes carry the same mass.
Here's the model-code:
@model function testModel( # a mixture of two 1d-gaussians with different mean and sd, but same mass, resulting in 4 2d-gaussians. sidePeakPosition::Real, sidePeakSd::Real, ) x ~ Normal( 0, 1000, ) # this is so the sampler knows that x is a variable (and it probably also uses it to initialise and as reference distribution (from its perspective it looks like a prior)) y ~ Normal( 0, 1000, ) # this is so the sampler knows that y is a variable (and it probably also uses it to initialise and as reference distribution (from its perspective it looks like a prior)) @addlogprob! logsumexp( [ logpdf( Normal( 0, 1, ), x, ), logpdf( Normal( sidePeakPosition, sidePeakSd, ), x, ), ], ) @addlogprob! logsumexp( [ logpdf( Normal( 0, 1, ), y, ), logpdf( Normal( sidePeakPosition, sidePeakSd, ), y, ), ], ) end

I dont know why the code is not showing correctly, here's the same as image:

The log-probability output of the pigeons sampler looks correct, so I'm reasonably sure that I specified the target distribution correctly.
I also wrote a simple rejection sampler (~20 lines of code) and sampled the model with that just to be sure. It returned what I think is the correct distribution.

The Pigeons output seems wrong in that it assigns unequal mass to the modes. The model has two parameters to change the size and position of the side-modes. I run the test with a position of 30 and a size of 0.1. I reran the test several times, the problem is always there and always of similar magnitude.
Pigeons returns good results when the position is 3 (size 0.1), however.

My questions are:

How can I access explorer diagnostics?
Is there a diagnostic that I missed somewhere that should have told me about a sampling problem?
If so, i suggest making this diagnostic more prominent in the documentation.
Ideally the documentation would have some kind of checklist about what diagnostics need to be checked to fully verify the sampler output.

Answer 1 · 2024-02-13T13:26:38.000Z

here's a minimal reprex:
test.zip
(it's a zipped julia file)
(I found the model can be reduced to 1 dimension without making the problem disappear, so the reprex uses a slightly different model from the opening post.).

Answer 2 · 2024-02-13T13:47:48.000Z

Here's the julia environment:
environment.zip
(zip file contains the Manifest.toml and Project.toml)

I don't know if its related, but when I set the number of chains to 2 the global barrier is estimated as zero, most (if not all) swaps are accepted and the returned samples roughly look like a mix of prior and posterior (for lack of a more precise description). I would have expected that in a model where the prior is so different from the posterior it would reject most swaps if the number of chains is 2.
Maybe this is just my lack of understanding of the algorithm, as I'm still new to the pigeons?

Answer 3 · 2024-02-14T00:41:54.000Z

I suspect this might be the same issue as #201, i.e. with @addlogprob! not getting picked up when building the annealing sequence of distribution. Would you mind showing the standard out when running pigeons?

Answer 4 · 2024-02-14T12:17:04.000Z

Sounds plausible to me at first thought.
Havent though about how that would affect the returned samples yet, though.

Running the reprex returns this:

Answer 5 · 2024-02-14T13:47:39.000Z

Ah! ah! Yes, this is it. The fact that the column with the global communication barrier (Λ) is zero means that the reference and target are equal. PT is only able to do its job when the reference is tractable.

The fix is simple, see Miguel's post at #201 (comment)

Answer 6 · 2024-02-14T15:17:51.000Z

Ah, returned samples look good now. Thanks!

Two things still don't quite make sense to me though:

First:
Before the fix pigeons seemed like it had already correctly figured out what is prior and posterior, as chain 1 looked like prior samples (rough visual inspection, didnt check exact standard deviation) and the last chain looked like posterior samples except for the wrong weights of the modes. Maybe the explorer had picked up the model in the way I had intended, whereas the swapping algorithm didnt? This is why it wasnt clear to me that there was a problem with the model specification. If the different parts of the algorithm interprete the model in different ways it makes diagnosis more difficult.

Second:
Is the local barrier plot supposed to look like this (after the fix)?:

Global barrier was estimated as 5.9, alpha was 0.33

For posterity:
here's the official documentation about the thing that fixed my problem: https://turing.ml/v0.22/docs/using-turing/advanced
(see the section about using @addlogprob and the comment that it also is applied to the prior by default)
So the only change needed in my code was to add an if-block around my @addlogprob statements that makes sure it only applies it to the posterior.

Answer 7 · 2024-02-14T16:19:16.000Z

It might be worth increasing the number of chains to say 20 to ensure it is greater than 2Λ...

Answer 8 · 2024-02-14T18:03:48.000Z

I'm not sure if you're responding to the second question in my previous post, or if that's more of a general tip.
I set chains to 40 and ran it again.
Didn't seem to change anything. The local barrier plot still looks that way.

Answer 9 · 2024-02-14T18:12:47.000Z

I was suggesting since the local barrier is very steep (plot above), thinking maybe it was not estimated properly. But if you get the same result with 40 chains then it is possible it has a very large peak close to the prior. Looking at the vague prior i.e. N(0, 1000) I can see this is possible I think.

Answer 10 · 2024-02-14T18:26:47.000Z

Ok, Thanks.

What about my comment about diagnosability of model problems seen in the context of how the explorer maybe was interpreting the bugged model different from the pt-algorithm?

Answer 11 · 2024-02-14T23:29:31.000Z

So with the fix and cranking up rounds to 12 I get perfectly reasonable results

using Pigeons
using DynamicPPL, Distributions, DistributionsAD
using MCMCChains
using Plots
using StatsPlots
using LogExpFunctions

# define model
@model function multimodalPhaseChangeModel( # a mixture of two 1d-gaussians with different mean and sd, but same mass
        sidePeakPosition::Real, 
        sidePeakSd::Real, 
        )
    x ~ Normal( 0, 1000, ) # this is so the sampler knows that x is a variable (and it probably also uses it to initialise and as reference distribution (from its perspective it looks like a prior))
    if !(DynamicPPL.leafcontext(__context__) isa DynamicPPL.PriorContext)
        @addlogprob! logsumexp( [
            logpdf( 
                Normal( 0, 1, ), 
                x, 
                ),
            logpdf( 
                Normal( sidePeakPosition, sidePeakSd, ), 
                x, 
                ),
            ], )
    end
end

samplerInputs = 
    Inputs(
        target =
            TuringLogPotential(
                multimodalPhaseChangeModel( 30, .1, ), 
                ), 
        record = [
            traces;
            record_default();
            ],
        n_rounds=12)
samplerInterface = pigeons( samplerInputs, );
resultAsArray = sample_array( samplerInterface, );
mean(>(15), resultAsArray[:,1,1]) # 0.526611328125

The local barrier plot looks like that because the log likelihood is almost non-integrable at the reference, which translates into a huge local barrier for the first grid point. But this is perfectly compensated by the fact that the grid point is very close to 0

julia>      samplerInterface.shared.tempering.schedule.grids
10-element Vector{Float64}:
 0.0
 3.829411589085135e-6
 2.2578556719318923e-5
 0.00012487925733740035
 0.0006184214800105778
 0.003258617478349858
 0.016962566014898484
 0.08532820176044159
 0.3923040131645543
 1.0