/SHERLOCK

Easy and versatile open-source code to explore Kepler, K2 and TESS data in the search for exoplanets

Primary LanguagePythonMIT LicenseMIT

SHERLOCK is an end-to-end pipeline that allows the users to explore the data from space-based missions to search for planetary candidates. It can be used to recover alerted candidates by the automatic pipelines such as SPOC and the QLP, the so-called Kepler objects of interest (KOIs) and TESS objects of interest (TOIs), and to search for candidates that remain unnoticed due to detection thresholds, lack of data exploration or poor photometric quality. To this end, SHERLOCK has six different modules to (1) acquire and prepare the light curves from their repositories, (2) search for planetary candidates, (3) vet the interesting signals, (4) perform a statistical validation, (5) model the signals to refine their ephemerides, and (6) compute the observational windows from ground-based observatories to trigger a follow-up campaign. To execute all these modules, the user only needs to fill in an initial YAML file with some basic information such as the star ID (KIC-ID, EPIC-ID, TIC-ID), the cadence to be used, etc., and use sequentially a few lines of code to pass from one step to the next. Alternatively, the user may provide with the light curve in a csv file, where the time, the normalized flux, and the flux error need to be given in columns comma-separated format.

Citation

We have already published a specific work presenting SHERLOCK! Hence, the best way to cite the software is referencing Dévora-Pajares et al. (2024):

@article{10.1093/mnras/stae1740,
    author = {Dévora-Pajares, Martín and Pozuelos, Francisco J and Thuillier, Antoine and Timmermans, Mathilde and Van Grootel, Valérie and Bonidie, Victoria and Mota, Luis Cerdeño and Suárez, Juan C},
    title = "{The sherlock pipeline: new exoplanet candidates in the WASP-16, HAT-P-27, HAT-P-26, and TOI-2411 systems}",
    journal = {Monthly Notices of the Royal Astronomical Society},
    volume = {532},
    number = {4},
    pages = {4752-4773},
    year = {2024},
    month = {07},
    abstract = "{The launches of NASA Kepler and Transiting Exoplanet Survey Satellite (TESS) missions have significantly enhanced the interest in the exoplanet field during the last 15 yr, providing a vast amount of public data that are being exploited by the community thanks to the continuous development of new analysis tools. However, using these tools is not straightforward, and users must dive into different codes, input–output formats, and methodologies, hindering an efficient and robust exploration of the available data. We present the sherlock pipeline, an end-to-end public software that allows the users to easily explore observations from space-based missions such as TESS or Kepler to recover known planets and candidates issued by the official pipelines and search for new planetary candidates that remained unnoticed. The pipeline incorporates all the steps to search for transit-like features, vet potential candidates, provide statistical validation, conduct a Bayesian fitting, and compute observational windows from ground-based observatories. Its performance is tested against a catalogue of known and confirmed planets from the TESS mission, trying to recover the official TESS Objects of Interest (TOIs), explore the existence of companions that have been missed, and release them as new planetary candidates. sherlock demonstrated an excellent performance, recovering 98 per cent of the TOIs and confirmed planets in our test sample and finding new candidates. Specifically, we release four new planetary candidates around the systems WASP-16 (with P \\$\\sim\\$ 10.46 d and R \\$\\sim\\$ 2.20 \\$\\mathrm\\{ R\\}\_\\{\\oplus \\}\\$), HAT-P-27 (with P \\$\\sim\\$ 1.20 d and R \\$\\sim\\$ 4.33 \\$\\mathrm\\{ R\\}\_\\{\\oplus \\}\\$), HAT-P-26 (with P \\$\\sim\\$ 6.59 d and R \\$\\sim\\$ 1.97 \\$\\mathrm\\{ R\\}\_\\{\\oplus \\}\\$), and TOI-2411 (with P \\$\\sim\\$ 18.75 d and R \\$\\sim\\$ 2.88 \\$\\mathrm\\{ R\\}\_\\{\\oplus \\}\\$).}",
    issn = {0035-8711},
    doi = {10.1093/mnras/stae1740},
    url = {https://doi.org/10.1093/mnras/stae1740},
    eprint = {https://academic.oup.com/mnras/article-pdf/532/4/4752/58747111/stae1740.pdf},
}

Additionally, we also encourage the citation of Pozuelos et al. (2020) because it is the first work where the first preliminary version of SHERLOCK was used:

@ARTICLE{2020A&A...641A..23P,
       author = {{Pozuelos}, Francisco J. and {Su{\'a}rez}, Juan C. and {de El{\'\i}a}, Gonzalo C. and {Berdi{\~n}as}, Zaira M. and {Bonfanti}, Andrea and {Dugaro}, Agust{\'\i}n and {Gillon}, Micha{\"e}l and {Jehin}, Emmanu{\"e}l and {G{\"u}nther}, Maximilian N. and {Van Grootel}, Val{\'e}rie and {Garcia}, Lionel J. and {Thuillier}, Antoine and {Delrez}, Laetitia and {Rod{\'o}n}, Jose R.},
        title = "{GJ 273: on the formation, dynamical evolution, and habitability of a planetary system hosted by an M dwarf at 3.75 parsec}",
      journal = {\aap},
     keywords = {planets and satellites: dynamical evolution and stability, planets and satellites: formation, Astrophysics - Earth and Planetary Astrophysics, Astrophysics - Solar and Stellar Astrophysics},
         year = 2020,
        month = sep,
       volume = {641},
          eid = {A23},
        pages = {A23},
          doi = {10.1051/0004-6361/202038047},
archivePrefix = {arXiv},
       eprint = {2006.09403},
 primaryClass = {astro-ph.EP},
       adsurl = {https://ui.adsabs.harvard.edu/abs/2020A&A...641A..23P},
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

Also, you may be interested in having a look at recent papers that used SHERLOCK:
Pozuelos et al. (2023)
Delrez et al. (2022)
Dransfield et al. (2022)
Luque et al. (2022)
Schanche et al. (2022)
Wells et al. (2021)
Benni et al. (2021)
Van Grootel et al. (2021)
Demory et al. (2020)

Full Tutorials

We have conducted dedicated workshops to teach SHERLOCK's usage and best practices. The last one was held on June 2023 at the Instituto de Astrofísica de Andalucía-CSIC. You can find all the material used (Jupyter notebooks, full examples, presentations, etc.) in this link: SHERLOCK Workshop IAA-CSIC. Let us know if you or your lab are interested in the SHERLOCK package! We might organize an introduction and a hands-on session to help you get familiar with the code and/or implement new functionalities.

Main Developers

Active: F.J. Pozuelos, M. Dévora

Additional contributors

A. Thuillier & L. García & Luis Cerdeño Mota

Documentation

Please visit https://sherlock-ph.readthedocs.io to get a complete set of explanations and tutorials to get started with SHERLOCK.

Launch

You can run SHERLOCK PIPEline as a standalone package by using:

python3 -m sherlockpipe --properties my_properties.yaml

You only need to provide a YAML file with any of the properties contained in the internal properties.yaml provided by the pipeline. The most important keys to be defined in your YAML file are those under the GLOBAL OBJECTS RUN SETUP and SECTOR OBJECTS RUN SETUP sections because they contain the object ids or files to be analysed in the execution. You'd need to fill at least one of those keys for the pipeline to do anything. If you still have any doubts please refer to the examples/properties directory

Additionally, you could only want to inspect the preparation stage of SHERLOCK and therefore, you can execute it without running the analyse phase so you can watch the light curve, the periodogram and the initial report to take better decisions to tune the execution parameters. Just launch SHERLOCK with:

python3 -m sherlockpipe --properties my_properties.yaml --explore

and it will end as soon as it has processed the preparation stages for each object.

Updates

SHERLOCK uses third party data to know TOIs, KOIs, EPICs and to handle FFIs and the vetting process. This data gets frequently updated from the active missions and therefore SHERLOCK will perform better if the metadata gets refreshed. You can simply run:

python3 -m sherlockpipe.update

and SHERLOCK will download the dependencies. It will store a timestamp to remember the last time it was refreshed to prevent several unneeded calls. However, if you find that there are more updates and you need them now, you can call:

python3 -m sherlockpipe.update --force

and SHERLOCK will ignore the timestamps and perform the update process. In addition, you could be interested in wiping all the metadata and build it again. That's why you could execute:

python3 -m sherlockpipe.update --clean

This last command implies a force statement and the last executed time will be ignored too.

You can additionally let SHERLOCK refresh the OIs list before running your current execution by adding to the YAML file the next line:

UPDATE_OIS=True

Vetting

SHERLOCK PIPEline comes with a submodule to examine the most promising transit candidates found by any of its executions. This is done via WATSON, capable of vetting TESS and Kepler targets. You should be able to execute the vetting by calling:

python3 -m sherlockpipe.vet --properties my_properties.yaml

Through that command you will run the vetting process for the given parameters within your provided YAML file. You could watch the generated results under $your_sherlock_object_results_dir/vetting directory. Please go to examples/vetting/ to learn how to inject the proper properties for the vetting process.

There is an additional simplified option which can be used to run the vetting. In case you are sure there is a candidate from the Sherlock results which matches your desired parameters, you can run

python3 -m sherlockpipe.vet --candidate ${theCandidateNumber}

from the sherlock results directory. This execution will automatically read the transit parameters from the Sherlock generated files.

Fitting

SHERLOCK PIPEline comes with another submodule to fit the most promising transit candidates found by any of its executions. This fit is done via ALLESFITTER code. By calling:

python3 -m sherlockpipe.fit --properties my_properties.yaml

you will run the fitting process for the given parameters within your provided YAML file. You could watch the generated results under $your_sherlock_object_results_dir/fit directory. Please go to examples/fitting/ to learn how to inject the proper properties for the fitting process.

There is an additional simplified option which can be used to run the fit. In case you are sure there is a candidate from the Sherlock results which matches your desired parameters, you can run

python3 -m sherlockpipe.fit --candidate ${theCandidateNumber}

from the sherlock results directory. This execution will automatically read the transit and star parameters from the Sherlock generated files.

Validation

SHERLOCK PIPEline implements a module to execute a statistical validation of a candidate by the usage of TRICERATOPS. By calling:

python3 -m sherlockpipe.validate --candidate ${theCandidateNumber}

you will run the validation for one of the Sherlock candidates.

Stability

SHERLOCK PIPEline also implements a module to execute a system stability computation by the usage of Rebound and SPOCK. By calling:

python3 -m sherlockpipe.stability --bodies 1,2,4

where the --bodies parameter is the set of the SHERLOCK accepted signals as CSV to be used in the scenarios simulation. You can also provide a stability properties file) to run a custom stability simulation:

python3 -m sherlockpipe.stability --properties stability.yaml

and you can even combine SHERLOCK accepted signals with some additional bodies provided by the properties file:

python3 -m sherlockpipe.stability --bodies 1,2,4 --properties stability.yaml

The results will be stored into a stability directory containing the execution log and a stability.csv containing one line per simulated scenario, sorted by the best results score.

Observation plan

SHERLOCK PIPEline also adds now a tool to plan your observations from ground-based observatories by using astropy and astroplan. By calling:

python3 -m sherlockpipe.plan --candidate ${theCandidateNumber} --observatories observatories.csv

on the resulting sherlockpipe.fit directory, where the precise candidate ephemeris are placed. The observatories.csv file should contain the list of available observatories for your candidate follow-up. As an example, you can look at this file.

SHERLOCK PIPEline Workflow

It is important to note that SHERLOCK PIPEline uses some csv files with TOIs, KOIs and EPIC IDs from the TESS, Kepler and K2 missions. Therefore your first execution of the pipeline might take longer because it will download the information.

Provisioning of light curve

The light curve for every input object needs to be obtained from its mission database. For this we use the high level API of Lightkurve, which enables the download of the desired light curves for TESS, Kepler and K2 missions. We also include Full Frame Images from the TESS mission by the usage of ELEANOR. We always use the PDCSAP signal from the ones provided by any of those two packages.

Pre-processing of light curve

In many cases we will find light curves which contain several systematics like noise, high dispersion beside the borders, high-amplitude periodicities caused by pulsators, fast rotators, etc. SHERLOCK PIPEline provides some methods to reduce these most important systematics.

Local noise reduction

For local noise, where very close measurements show high deviation from the local trend, we apply a Savitzky-Golay filter. This has proved a highly increment of the SNR of found transits. This feature can be disabled with a flag.

High RMS areas masking

Sometimes the spacecrafts have to perform reaction wheels momentum dumps by firing thrusters, sometimes there is high light scattering and sometimes the spacecraft can infer some jitter into the signal. For all of those systematics we found that in many cases the data from those regions should be discarded. Thus, SHERLOCK PIPEline includes a binned RMS computation where bins whose RMS value is higher than a configurable factor multiplied by the median get automatically masked. This feature can be disabled with a flag.

Input time ranges masking

If enabled, this feature automatically disables High RMS areas masking for the assigned object. The user can input an array of time ranges to be masked into the original signal.

Detrend of high-amplitude periodicities

Our most common foes with high periodicities are fast-rotators, which infer a high sinusoidal-like trend in the PDCSAP signal. This is why SHERLOCK PIPEline includes an automatic high-amplitude periodicities detection and detrending during its preparation stage. This feature can be disabled with a flag.

Input period detrend

If enabled, this feature automatically disables Detrend of high-amplitude periodicities for the assigned object. The user can input a period to be used for an initial detrend of the original signal.

Custom user code

You can even inject your own python code to perform:

  • A custom signal preparation task by implementing the CurvePreparer class that we provide. Then, inject your python file into the CUSTOM_PREPARER property and let SHERLOCK use your code.
  • A custom best signal selection algorithm by implementing the SignalSelector. class that we provide. Then, inject your python file into the CUSTOM_ALGORITHM property and let SHERLOCK use your code.
  • A custom search zone definition by implementing the SearchZone. class that we provide. Then, inject your python file into the CUSTOM_SEARCH_ZONE property and let SHERLOCK use your code.
  • Custom search modes: 'tls', 'bls', 'grazing', 'comet' or 'custom'. You can search for transits by using TLS, BLS, TLS for a grazing template, TLS for a comet template or even inject your custom transit template (this is currently included as an experimental feature).

For better understanding of usage please see the examples, which references custom implementations that you can inspect in our custom algorithms directory