likelihood using sufficient statistics
tpapp opened this issue · 6 comments
(This is not a bugreport, I hope it is OK to ask questions in issues, I found no other forum.)
I have a Hidden Markov Chain model where a set of parameters are transformed into state evolution and observation stochastic matrices (there is some structure on them), which in turn define the likelihood for a collection of observed sequences of the same chain run many times.
Counting the types of observed sequences is a sufficient statistic of the data, and it makes the calculation faster too. So I have
- a
counts::Dict{Vector{Int},Int}
counting each, with typical elements like[1,2,2,1] => 137
, - and a function
loglikelihood(counts, θ)
which I already coded.
This, together with my prior on parameters θ
, defines the posterior.
The primary question is: what's the correct way to wrap this up in Mamba, so I can run it with mcmc
? Studying the tutorial and some examples, I thought that I should define a distribution for θ
given the counts
, which just wraps loglikelihood
above, with length
and insupport
thrown in, and put counts
in the dictionary for inputs
.
The linear regression example in the manual has it another way, defining the distribution for y
, but I thought it is symmetrical anyway and in any case Distributions
has nothing that can fit my sufficient statistics.
To make things concrete, here is a MWE for a plain vanilla binary Markov chain (not my actual problem, but that's way to long to post here, hopefully this gets the idea across). Anything I should do differently? Also, should I just incorporate the prior into the distribution, too?
[note that I skipped the incantations for parallelization for brevity].
using Distributions
using Mamba
function simulate_bmc(π::AbstractVector, N::Int)
s = 1
[(s = (rand(Float64) ≤ π[s] ? 3-s : s); s) for _ in 1:N]
end
function bmc_transition_count(bmc_sequence)
c = zeros(Int, 2, 2)
[c[tt...] += 1 for tt in Base.partition(bmc_sequence, 2)]
c
end
function bmc_loglikelihood(π, transition_count)
ll(count, p) = count == 0 ? count : count*log(p)
sum(ll.(transition_count, [1-π[1] π[1]; π[2] 1-π[2]]))
end
type BMCDist <: ContinuousMultivariateDistribution
transitions::Matrix{Int}
end
Distributions.length(::BMCDist) = 2
function Distributions.insupport{T<:Real}(d::BMCDist, π::AbstractVector{T})
length(π) == 2 && all(π .≥ 0.0) && all(π .≤ 1.0)
end
function Distributions._logpdf{T<:Real}(d::BMCDist, π::AbstractVector{T})
bmc_loglikelihood(π, d.transitions)
end
true_π = [0.1,0.2]
tc = bmc_transition_count(simulate_bmc(true_π, 1000))
model = Model(π = Stochastic(1, (y) -> BMCDist(y), true))
scheme = [NUTS(:π)]
setsamplers!(model, scheme)
data = Dict{Symbol, Any}(:y => tc)
inits = [Dict{Symbol, Any}(:π => fill(0.5,2)) for i in 1:3]
sim = mcmc(model, data, inits, 2000;
burnin=500, chains=length(inits), verbose=true)
describe(sim)
I only skimmed your question (I will take a more in depth look later) but my first idea is a common trick used in MCMC software (e.g. winBUGS) called the Poisson zeros trick.
If you have some data Y ~Poisson(\theta) and the observed value of Y is 0, then the likelihood is
L(\theta|Y) = \theta^0 e^{-\theta} / 0! = e^{-\theta}
so if you define theta to be your desired log-likelihood and introduce auxiliary data Y = 0 you can force your model to include the desired likelihood.
Again, I haven't look in depth in this so there may be more of a direct way in Mamba.
@bdeonovic: Thanks for the answer, but I am not sure I understand how this relates to what I was asking, maybe my question was not clear.
To reiterate: I basically have a log-likelihood function ll(theta, S(y))
, where S
is sufficient statistics for the likelihood and can be calculated just once. So it is (in principle) much faster to use S
instead of y
.
From the perspective of MCMC, ll(theta)
can be seen as a black box as far as Mamba is concerned. But still it would be nice if I could pass S
without a closure, so I used the data Dict
as a the second parameter of MCMC. The question was about whether this is the recommended method.
I have another question: suppose some elements of theta
are constrained, eg to an interval, or > 0. Should I represent he parameters in some gamma
that is unconstrained, and make Mamba work with that, doing the transformations myself, or is there a way to tell Mamba that my space is constrained?
Sorry for not getting back to this in a more timely fashion. So there are a few issues with your example. After clarifying these I hope you will be able to solve your more complex problem:
First of all your example is a bit confusing. Here is how I understand the model you presented, please correct me if I am wrong. You have some data (y
) which looks like some kind of time series that depends on some data π
. You have some method of evaluating a loglikelihood of y
that only depends on some summary statistic Sy
.
If that is the case you need to define a custom distribution on y
(or if it is more convenient on Sy
). For example
using Distributions
import Distributions: size, insupport, _logpdf
function simulate_bmc(π::AbstractVector, N::Int)
s = 1
[(s = (rand(Float64) ≤ π[s] ? 3-s : s); s) for _ in 1:N]
end
function bmc_transition_count(bmc_sequence)
c = zeros(Int, 2, 2)
[c[tt...] += 1 for tt in Base.partition(bmc_sequence, 2)]
c
end
function bmc_loglikelihood(π, transition_count)
if all(π .≥ 0.0) && all(π .≤ 1.0)
ll(count, p) = count == 0 ? count : count*log(p)
return sum(ll.(transition_count, [1-π[1] π[1]; π[2] 1-π[2]]))
else
return -Inf
end
end
## The distribution of the summary statistic Sy
## Note since Sy is a matrix the type needs to be Continuous MatrixDistribution
type SummaryStatDist <: ContinuousMatrixDistribution
π::Vector{Float64}
end
## Matrix distributions need to implement size, insupport, and _logpdf
size(::SummaryStatDist) = (2,2)
insupport{T<:Real}(d::SummaryStatDist, S::AbstractMatrix{T}) = true
function _logpdf{T<:Real}(d::SummaryStatDist, x::AbstractMatrix{T})
bmc_loglikelihood(d.π, x)
end
With this in place you can simulate some data
true_π = [0.1,0.2]
tc = bmc_transition_count(simulate_bmc(true_π, 1000))
and we can now evaluate logpdfs properly:
julia> logpdf(SummaryStatDist([0.5,0.5]), tc)
-346.5735902799727
julia> logpdf(SummaryStatDist([0.1,0.2]), tc)
-181.9350344788923
Note the logpdf of the data evaluated at the true π is higher than at (0.5,0.5).
Next you need to let Mamba have access to your custom distribution by placing it in a quote
block and evaluating it inside of Mamba.
Here is the rest of the code:
using Mamba
extensions = quote
using Distributions
import Distributions: size, insupport, _logpdf
function simulate_bmc(π::AbstractVector, N::Int)
s = 1
[(s = (rand(Float64) ≤ π[s] ? 3-s : s); s) for _ in 1:N]
end
function bmc_transition_count(bmc_sequence)
c = zeros(Int, 2, 2)
[c[tt...] += 1 for tt in Base.partition(bmc_sequence, 2)]
c
end
function bmc_loglikelihood(π, transition_count)
if all(π .≥ 0.0) && all(π .≤ 1.0)
ll(count, p) = count == 0 ? count : count*log(p)
return sum(ll.(transition_count, [1-π[1] π[1]; π[2] 1-π[2]]))
else
return -Inf
end
end
type SummaryStatDist <: ContinuousMatrixDistribution
π::Vector{Float64}
end
size(::SummaryStatDist) = (2,2)
insupport{T<:Real}(d::SummaryStatDist, S::AbstractMatrix{T}) = true
function _logpdf{T<:Real}(d::SummaryStatDist, x::AbstractMatrix{T})
bmc_loglikelihood(d.π, x)
end
export SummaryStatDist, bmc_transition_count, simulate_bmc
end
## add custom distribution to Mamba
eval(Mamba, extensions)
## Simulate data
true_π = [0.1,0.2]
tc = bmc_transition_count(simulate_bmc(true_π, 1000))
## Mamba model
## note the distribution on summary statistic is matrix (hence 2 in Stochastic)
## I also added a prior to the π
model = Model(Sy = Stochastic(2, π -> SummaryStatDist(π), false),
π = Stochastic(1, () -> Dirichlet([1,1]), true))
## For sampling π we should use a sampler that is more appropriate to
## a vector that sums to 1
scheme = [SliceSimplex(:π)]
setsamplers!(model, scheme)
## the data
data = Dict{Symbol, Any}(:Sy => tc)
## initial values (all stochastic nodes need initial values)
inits = [Dict{Symbol, Any}(:Sy => tc, :π => fill(0.5,2)) for i in 1:3]
## run MCMC
sim = mcmc(model, data, inits, 2000, burnin=500, chains=length(inits), verbose=true)
describe(sim)
Iterations = 501:2000
Thinning interval = 1
Chains = 1,2,3
Samples per chain = 1500
Empirical Posterior Estimates:
Mean SD Naive SE MCSE ESS
π[1] 0.3404566 0.02120903 0.00031616553 0.00034636178 1500
π[2] 0.6595434 0.02120903 0.00031616553 0.00034636178 1500
Quantiles:
2.5% 25.0% 50.0% 75.0% 97.5%
π[1] 0.29992232 0.32594111 0.3403533 0.35487939 0.38274701
π[2] 0.61725299 0.64512061 0.6596467 0.67405889 0.70007768
Thanks for your time. I have read the manual and the examples, and I see where the code above is coming from. Again, my primary problem is that my summary statistics are not regular so that I could map the to a matrix. The most natural data type for them is a Dict{Vector{Int},Int}
. I understand that Mamba.jl
prefers to use Distributions
, but not all summary statistics can be cast in this framework.
In that case the Poisson trick I described in my original post will work. Define your loglikelihood to take in your "strange object" , define an auxiliary zero count, and the model as following:
## data
data = Dict{Symbol, Any}(:Sy => ...this is that Dict structure you described..., :zero => 0)
## Model
model = Model(zero = Stochastic( (Sy, π) -> Poisson(-bmc_loglikelihood(π, Sy)), false),
π = Stochastic(1, () -> Dirichlet([1,1]), true))
EDIT: forgot a negative sign
Thanks — I just saw your edit about the negative sign, and it works fine. For future readers of this issue, here is a MWE, estimating the scale parameter of a normal distribution using sufficient statistics:
using Mamba
using Distributions
# EXAMPLE: fitting mean of a normal using sufficient statistics
y = randn(100)*2 # μ=0 (assumed known), σ=2 (estimate)
# sufficient statistic: number of observations, sum of squares
y_ss = (length(y), sum(y^2 for y in y))
loglikelihood_ss(ss, σ) = -ss[2]/(2*σ^2)-ss[1]*log(σ)
data = Dict(:zero => 0, :ss => y_ss)
inits = [Dict(:zero => 0, :σ => 1.0) for _ in 1:3]
model = Model(zero = Stochastic((σ, ss) ->
Poisson(-loglikelihood_ss(ss,σ)), false),
σ = Stochastic(() -> Uniform(0,10)))
setsamplers!(model, [NUTS([:σ])])
sim = mcmc(model, data, inits, 1000; burnin=500, chains=length(inits))
describe(sim)