Naive beginner's question - categorical variable input

Question

Naive beginner's question - categorical variable input

Closed this issue 4 years ago · 2 comments

Hi there, I'm just getting started with ForneyLab, it's fantastic! I'm also new to Julia, so apologies if this question is obvious...

I'm starting with a classic Bayesian disease diagnosis example. I have two random variables: presence or absence of disease θ, which determines the probability of a positive or negative test result y. My first model is:

pr = [0.05, 0.95]             # Prior: 5% probability of having the disease 
g = FactorGraph()
@RV θ ~ Categorical(pr)        # Infection θ
@RV y ~ Transition(θ,A)         # Test result y. Array A is 2x2 where A[i,j]=P(y=i | θ=j)
placeholder(y, :y, dims=(2,)); # Placeholder for observation

That works perfectly. I now want to replace the fixed prior on disease pr with a beta distribution on the probability of having the disease. I tried:

g = FactorGraph()
@RV p_θ ~ Beta(2,5)              # Prior probability of infection being positive
@RV p_nve = 1 - p_θ              # Prior probability of infection being negative
@RV θ ~ Categorical([p_θ, p_nve]) # Infection θ
@RV y ~ Transition(θ,A)           # Test result y
placeholder(y, :y, dims=(2,));   # Placeholder for observation

This fails to invert with the error "LoadError: KeyError: key nothing not found" and I get the split factor graph shown below. How can I fix this, so I provide the necessary vector of length 2 into the categorical variable θ, please?

Many thanks
Peter

Answer 1 · 2021-02-09T15:17:18.000Z

Hi @pzeidman! Thanks for trying out ForneyLab.

Instead of using Beta prior, you should useDirichletwhich is a conjugate prior for Categorical.
Technically, you could employ Beta-Bernoulli pairing as you have only two classes. But since you need a transition matrix A this pairing is not suitable for your model.

Here is a code snippet for constructing FFG:

using ForneyLab

A = [1 0; 0 1] # arbitary matrix
g = FactorGraph()
@RV p_θ ~ Dirichlet(placeholder(:αθ, dims=(2, ))) # create Multivariate Dirichlet node (prior for Categorical)
@RV θ ~ Categorical(p_θ) # Infection θ
@RV y ~ Transition(θ,A)  # Test result y
placeholder(y, :y, dims=(2,))

ForneyLab.draw()

Unlike your first model, now you can't get along without Variational Message Passing.
Here is an example of learning parameters in an online fashion (we use the posterior of the previous time step as a prior for the current time step):

# Specify mean-filed posterior factorization around Categorical node.
q = PosteriorFactorization(p_θ, θ, ids=[:Pθ :θ]) 
algo = messagePassingAlgorithm(free_energy=true) # Figure out a schedule and compile to Julia code
source_code = algorithmSourceCode(algo, free_energy=true)
eval(Meta.parse(source_code))

observations = map(_ -> rand([[ 1.0, 0.0 ], [ 0.0, 1.0 ]]), 1:100) # random observations for demonstation purposes
vmp_iter = 5 # number of variational iterations

fe = Array{Float64}(undef, size(observations, 1), vmp_iter) # matrix for free-energy storage (row - observation, column - vmp iteration)
p_θ_est = Vector{Vector{Float64}}(undef, size(observations, 1)) # estimates of p_θ
θ_est = Vector{Vector{Float64}}(undef, size(observations, 1)) # estimates of θ

p_θ_min = [1.0, 1.0] # parameters for Dirichlet prior
θ_min = [0.5, 0.5] #

marginals = Dict()
for t in 1:size(observations, 1)
    marginals = Dict(:θ => ProbabilityDistribution(Categorical, p=θ_min),
                     :p_θ => ProbabilityDistribution(Multivariate, Dirichlet, a=p_θ_min)) # Initialize marginals (NOTE: vague marginals are also fine)

    # marginals = Dict(:θ => vague(Categorical, 2), :p_θ => vague(Dirichlet, (2, )))

    # Provide observation :y and prior for Dirichlet :αθ
    data = Dict(:y => observations[t], :αθ => p_θ_min)

    for i in 1:vmp_iter
        stepθ!(data, marginals)
        stepPθ!(data, marginals)
        fe[t, i] = freeEnergy(data, marginals)
    end

    # Extract computed marginals
    θ_min = marginals[:θ].params[:p]
    p_θ_min = marginals[:p_θ].params[:a] # this marginal will be used as a prior for the next observation

    # Store computed marginals
    p_θ_est[t] = p_θ_min
    θ_est[t] = θ_min
end

# Estimates
pθ̂ = mean(p_θ_est)
θ̂ = mean(θ_est)

You may think of building a full model with shared parameters to improve inference results and shorten the code for inference.
You can use the free energy (fe) score as a performance metric when compare models.
If you have any other questions, we'll be happy to answer them. Good luck!

Answer 2 · 2021-02-09T15:32:03.000Z

That's very clear, thank you! (And yes, a Bernoulli would have sufficed here - I wanted to use a transition matrix for scaling up to larger problems.)

Many thanks for your help
Peter