likelihood using sufficient statistics

Question

likelihood using sufficient statistics

tpapp opened this issue 8 years ago · 6 comments

(This is not a bugreport, I hope it is OK to ask questions in issues, I found no other forum.)
I have a Hidden Markov Chain model where a set of parameters are transformed into state evolution and observation stochastic matrices (there is some structure on them), which in turn define the likelihood for a collection of observed sequences of the same chain run many times.

Counting the types of observed sequences is a sufficient statistic of the data, and it makes the calculation faster too. So I have

a counts::Dict{Vector{Int},Int} counting each, with typical elements like [1,2,2,1] => 137,
and a function loglikelihood(counts, θ) which I already coded.

This, together with my prior on parameters θ, defines the posterior.

The primary question is: what's the correct way to wrap this up in Mamba, so I can run it with mcmc? Studying the tutorial and some examples, I thought that I should define a distribution for θ given the counts, which just wraps loglikelihood above, with length and insupport thrown in, and put counts in the dictionary for inputs.

The linear regression example in the manual has it another way, defining the distribution for y, but I thought it is symmetrical anyway and in any case Distributions has nothing that can fit my sufficient statistics.

To make things concrete, here is a MWE for a plain vanilla binary Markov chain (not my actual problem, but that's way to long to post here, hopefully this gets the idea across). Anything I should do differently? Also, should I just incorporate the prior into the distribution, too?
[note that I skipped the incantations for parallelization for brevity].

using Distributions
using Mamba

function simulate_bmc(π::AbstractVector, N::Int)
    s = 1
    [(s = (rand(Float64) ≤ π[s] ? 3-s : s); s) for _ in 1:N]
end

function bmc_transition_count(bmc_sequence)
    c = zeros(Int, 2, 2)
    [c[tt...] += 1 for tt in Base.partition(bmc_sequence, 2)]
    c
end

function bmc_loglikelihood(π, transition_count)
    ll(count, p) = count == 0 ? count : count*log(p)
    sum(ll.(transition_count, [1-π[1] π[1]; π[2] 1-π[2]]))
end

type BMCDist <: ContinuousMultivariateDistribution
    transitions::Matrix{Int}
end

Distributions.length(::BMCDist) = 2

function Distributions.insupport{T<:Real}(d::BMCDist, π::AbstractVector{T})
    length(π) == 2 && all(π .≥ 0.0) && all(π .≤ 1.0)
end

function Distributions._logpdf{T<:Real}(d::BMCDist, π::AbstractVector{T})
    bmc_loglikelihood(π, d.transitions)
end

true_π = [0.1,0.2]
tc = bmc_transition_count(simulate_bmc(true_π, 1000))

model = Model(π = Stochastic(1, (y) -> BMCDist(y), true))
scheme = [NUTS(:π)]
setsamplers!(model, scheme)
data = Dict{Symbol, Any}(:y => tc)
inits = [Dict{Symbol, Any}(:π => fill(0.5,2)) for i in 1:3]
sim = mcmc(model, data, inits, 2000;
           burnin=500, chains=length(inits), verbose=true)
describe(sim)

Answer 1 · 2016-11-30T03:14:35.000Z

I only skimmed your question (I will take a more in depth look later) but my first idea is a common trick used in MCMC software (e.g. winBUGS) called the Poisson zeros trick.

If you have some data Y ~Poisson(\theta) and the observed value of Y is 0, then the likelihood is

L(\theta|Y) = \theta^0 e^{-\theta} / 0! = e^{-\theta}

so if you define theta to be your desired log-likelihood and introduce auxiliary data Y = 0 you can force your model to include the desired likelihood.

Again, I haven't look in depth in this so there may be more of a direct way in Mamba.

Answer 2 · 2016-12-08T09:21:44.000Z

@bdeonovic: Thanks for the answer, but I am not sure I understand how this relates to what I was asking, maybe my question was not clear.

To reiterate: I basically have a log-likelihood function ll(theta, S(y)), where S is sufficient statistics for the likelihood and can be calculated just once. So it is (in principle) much faster to use S instead of y.

From the perspective of MCMC, ll(theta) can be seen as a black box as far as Mamba is concerned. But still it would be nice if I could pass S without a closure, so I used the data Dict as a the second parameter of MCMC. The question was about whether this is the recommended method.

I have another question: suppose some elements of theta are constrained, eg to an interval, or > 0. Should I represent he parameters in some gamma that is unconstrained, and make Mamba work with that, doing the transformations myself, or is there a way to tell Mamba that my space is constrained?

Answer 3 · 2016-12-08T17:58:52.000Z

Sorry for not getting back to this in a more timely fashion. So there are a few issues with your example. After clarifying these I hope you will be able to solve your more complex problem:

First of all your example is a bit confusing. Here is how I understand the model you presented, please correct me if I am wrong. You have some data (y) which looks like some kind of time series that depends on some data π. You have some method of evaluating a loglikelihood of y that only depends on some summary statistic Sy.

If that is the case you need to define a custom distribution on y (or if it is more convenient on Sy). For example

 using Distributions
  import Distributions: size, insupport, _logpdf
  function simulate_bmc(π::AbstractVector, N::Int)
    s = 1
    [(s = (rand(Float64) ≤ π[s] ? 3-s : s); s) for _ in 1:N]
  end

  function bmc_transition_count(bmc_sequence)
    c = zeros(Int, 2, 2)
    [c[tt...] += 1 for tt in Base.partition(bmc_sequence, 2)]
    c
  end

  function bmc_loglikelihood(π, transition_count)
    if all(π .≥ 0.0) && all(π .≤ 1.0)
      ll(count, p) = count == 0 ? count : count*log(p)
      return sum(ll.(transition_count, [1-π[1] π[1]; π[2] 1-π[2]]))
    else
      return -Inf
    end
  end

  ## The distribution of the summary statistic Sy
  ## Note since Sy is a matrix the type needs to be Continuous MatrixDistribution
  type SummaryStatDist <: ContinuousMatrixDistribution
    π::Vector{Float64}
  end

  ## Matrix distributions need to implement size, insupport, and _logpdf
  size(::SummaryStatDist) = (2,2)
  insupport{T<:Real}(d::SummaryStatDist, S::AbstractMatrix{T}) = true

  function _logpdf{T<:Real}(d::SummaryStatDist, x::AbstractMatrix{T})
    bmc_loglikelihood(d.π, x)
  end

With this in place you can simulate some data

true_π = [0.1,0.2]
tc = bmc_transition_count(simulate_bmc(true_π, 1000))

and we can now evaluate logpdfs properly:

julia> logpdf(SummaryStatDist([0.5,0.5]), tc)
-346.5735902799727

julia> logpdf(SummaryStatDist([0.1,0.2]), tc)
-181.9350344788923

Note the logpdf of the data evaluated at the true π is higher than at (0.5,0.5).

Next you need to let Mamba have access to your custom distribution by placing it in a quote block and evaluating it inside of Mamba.

Here is the rest of the code:

using Mamba

extensions = quote
  using Distributions
  import Distributions: size, insupport, _logpdf
  function simulate_bmc(π::AbstractVector, N::Int)
    s = 1
    [(s = (rand(Float64) ≤ π[s] ? 3-s : s); s) for _ in 1:N]
  end

  function bmc_transition_count(bmc_sequence)
    c = zeros(Int, 2, 2)
    [c[tt...] += 1 for tt in Base.partition(bmc_sequence, 2)]
    c
  end

  function bmc_loglikelihood(π, transition_count)
    if all(π .≥ 0.0) && all(π .≤ 1.0)
      ll(count, p) = count == 0 ? count : count*log(p)
      return sum(ll.(transition_count, [1-π[1] π[1]; π[2] 1-π[2]]))
    else
      return -Inf
    end
  end

  type SummaryStatDist <: ContinuousMatrixDistribution
    π::Vector{Float64}
  end

  size(::SummaryStatDist) = (2,2)
  insupport{T<:Real}(d::SummaryStatDist, S::AbstractMatrix{T}) = true

  function _logpdf{T<:Real}(d::SummaryStatDist, x::AbstractMatrix{T})
    bmc_loglikelihood(d.π, x)
  end
  export SummaryStatDist, bmc_transition_count, simulate_bmc
end

## add custom distribution to Mamba
eval(Mamba, extensions)

## Simulate data
true_π = [0.1,0.2]
tc = bmc_transition_count(simulate_bmc(true_π, 1000))

## Mamba model
## note the distribution on summary statistic is matrix (hence 2 in Stochastic)
## I also added a prior to the π 
model = Model(Sy = Stochastic(2, π -> SummaryStatDist(π), false),
              π = Stochastic(1, () -> Dirichlet([1,1]), true))

## For sampling  π  we should use a sampler that is more appropriate to 
## a vector that sums to 1
scheme = [SliceSimplex(:π)]
setsamplers!(model, scheme)

## the data
data = Dict{Symbol, Any}(:Sy => tc)

## initial values (all stochastic nodes need initial values)
inits = [Dict{Symbol, Any}(:Sy => tc, :π => fill(0.5,2)) for i in 1:3]

## run MCMC
sim = mcmc(model, data, inits, 2000, burnin=500, chains=length(inits), verbose=true)
describe(sim)

Iterations = 501:2000
Thinning interval = 1
Chains = 1,2,3
Samples per chain = 1500

Empirical Posterior Estimates:
        Mean       SD        Naive SE        MCSE      ESS
π[1] 0.3404566 0.02120903 0.00031616553 0.00034636178 1500
π[2] 0.6595434 0.02120903 0.00031616553 0.00034636178 1500

Quantiles:
        2.5%       25.0%     50.0%      75.0%      97.5%  
π[1] 0.29992232 0.32594111 0.3403533 0.35487939 0.38274701
π[2] 0.61725299 0.64512061 0.6596467 0.67405889 0.70007768

Answer 4 · 2016-12-08T20:21:41.000Z

Thanks for your time. I have read the manual and the examples, and I see where the code above is coming from. Again, my primary problem is that my summary statistics are not regular so that I could map the to a matrix. The most natural data type for them is a Dict{Vector{Int},Int}. I understand that Mamba.jl prefers to use Distributions, but not all summary statistics can be cast in this framework.

Answer 5 · 2016-12-08T20:33:21.000Z

In that case the Poisson trick I described in my original post will work. Define your loglikelihood to take in your "strange object" , define an auxiliary zero count, and the model as following:

## data
data = Dict{Symbol, Any}(:Sy => ...this is that Dict structure you described..., :zero => 0)

## Model
model = Model(zero = Stochastic( (Sy, π) -> Poisson(-bmc_loglikelihood(π, Sy)), false),
                          π = Stochastic(1, () -> Dirichlet([1,1]), true))

EDIT: forgot a negative sign

Answer 6 · 2016-12-09T15:24:24.000Z

Thanks — I just saw your edit about the negative sign, and it works fine. For future readers of this issue, here is a MWE, estimating the scale parameter of a normal distribution using sufficient statistics:

using Mamba
using Distributions

# EXAMPLE: fitting mean of a normal using sufficient statistics
y = randn(100)*2                # μ=0 (assumed known), σ=2 (estimate)
# sufficient statistic: number of observations, sum of squares
y_ss = (length(y), sum(y^2 for y in y))

loglikelihood_ss(ss, σ) = -ss[2]/(2*σ^2)-ss[1]*log(σ)

data = Dict(:zero => 0, :ss => y_ss)
inits = [Dict(:zero => 0, :σ => 1.0) for _ in 1:3]
model = Model(zero = Stochastic((σ, ss) ->
                                Poisson(-loglikelihood_ss(ss,σ)), false),
              σ = Stochastic(() -> Uniform(0,10)))
setsamplers!(model, [NUTS([:σ])])
sim = mcmc(model, data, inits, 1000; burnin=500, chains=length(inits))
describe(sim)