adrn/thejoker

Strange behavior

zhexingli opened this issue · 9 comments

Hello, I was trying to use thejoker to fit a sparse sample of data that could indicate the presence of a massive companion. Attached is RVSearch result. Although it doesn't find any signal, it's apparent the long period curvature is there.
PeriodSearch

However, when fitting the data through thejoker something doesn't seem right. The process quickly finishes on one core with 2^28 prior samples but returning only a few accepted samples after the rejection process. And the plot doesn't look right either.
jokerplot
The plot of accepted samples looks like just connecting the data points with straight lines. When examining the returned accpeted samples, the parameters all look fine.

I'm not sure what's going on here since I'm using the same code that works for other targets. I'm attaching my code here:
JokerFit.py.zip

From my experience thejoker can deal with incomplete orbits with no problem at all. Is this possibly due to orbit being very well constrained despite being incomplete?

Sorry, nvm, I think the straightlines are due to the t_grid I specified within the plotting function. But for this case, should I inflate the error bars to allow more samples to be returned? Here I used 2^26 prior samples and only a few returned...

This is starting to look silly. Inflating the error bars helped.. Close this now...

adrn commented

Just taking a look at this now! Yes I was going to suggest increasing the number of points you used in t_grid. For the first issue, I'm surprised that thejoker doesn't return any samples with periods ~3500 days or so! What are you using for your period prior here?

I specified minimum period of 2000 days and max 10000 days within the defaul prior setting. I think the issue is that the error bars (all 2 or 3 m/s level) for each data point are too small compared to the actual signal so that it falls under the category of underestimated error described in the paper I think. This returned only a few accepted samples. However, I later manually inflated the error bars by 20 m/s and thejoker is now able to return a lot more useful samples (thousands) and returned a period value of around 3000 days and K of 400 m/s based on the corner plot, which seems correct :)

adrn commented

Interesting - I would have said the opposite! To me, it looks like there is a clear trend in the radial velocity measurements, so the error bars look like they could be correct, and the data are just pretty constraining. I would allow more flexibility in the mean velocity and in the semi-amplitude, and if you want to allow for inflated error bars I would try a prior more like:

import exoplanet.units as xu
import pymc3 as pm

with pm.Model():
    s = xu.with_unit(pm.Lognormal('s', 1, 1),
                     u.m/u.s)
                     
    prior = tj.JokerPrior.default(
        P_min=2000*u.day,
        P_max=10000*u.day,
        sigma_K0=1*u.km/u.s,
        sigma_v=1*u.km/u.s,
        P0=5*u.yr,
        s=s
    )

Thank you! That's helpful. I wondered how to let the jitter term vary during the fitting. I'll keep that in mind for my other fitting processes.

It appears when I fit incomplete orbits, the larger the sigma_0 I give to the prior, the longer and larger the orbital period and semi-amplitude will be returned by thejoker. Is this because of the larger parameter space we're allowing the semi-amplitude to explore? What would be the best value of sigma_0 to be used? The 1 km/s you recommended above or something different?

I have a friend that uses thejoker but with a very old version, 0.x version. That version doesn't require this sigma prior input. What default sigma was used in that version?

adrn commented

It appears when I fit incomplete orbits, the larger the sigma_0 I give to the prior, the longer and larger the orbital period and semi-amplitude will be returned by thejoker.

That is expected behavior! Changing sigma_0 effectively changes the largest mass companion that can exist at a given orbital period. So, making sigma_0 larger means you are allowing solutions with larger and larger masses, which at fixed semi-amplitude, means you could have a larger mass object at longer period match the data.

What would be the best value of sigma_0 to be used? The 1 km/s you recommended above or something different?

This depends on your prior beliefs about the companion mass :). If you think it must be a star, you want to set it to something more like 10's of km/s. If you think low-mass star, more like a few km/s. For planets, probably something like 10s or 100s of m/s. If you don't have other data to give you a prior belief on the mass, your best bet is to try a few values and describe how the results depend on the assumption.

I have a friend that uses thejoker but with a very old version, 0.x version. That version doesn't require this sigma prior input. What default sigma was used in that version?

This was actually a bug in that version, so your friend should update to a newer version! There is no escaping the need for the sigma_K prior, and it was a math error in the original implementation.

Thanks for the detailed explanation. This is very helpful!