Machine Learning 4 the Search for Extra Terrestrial Intelligence (http://www.seti.org/ml4seti)
The SETI Institute is hosting a public hackathon and global, online code challenge to find a robust signal classification algorithm for use in our mission to find E.T. radio communication. By framing the radio signal data as spectrogram (a 2D visual representation), we can convert the problem into something akin to an image classification problem. We are looking for participants to build machine learning and deep learnng / AI techniques to construct highly accurate classification systems that will be used in our data analysis pipeline at our telescope array.
This page documents the details for the SETI Institute's Hackathon (June 10-11, 2017) and code challenge (June 1 to July 31, 2017), as generally described on our main page, http://www.seti.org/ml4seti
Each night, using the Allen Telescope Array (ATA) in northern California, we scan the sky at various radio frequencies, observing star systems with known exoplanets, searching for faint but persistent signals. The current signal detection system is programmed to search only for particular kinds of signals: narrow-band carrier waves. However, the detection system sometimes triggers on signals that are not narrow-band signals (with unknown efficiency) and are also not explicitly-known radio frequency interference (RFI). There seems to be various categories of these kinds of events that have been observed in the past.
Our goal is to classify these accurately in real-time. This would allow the signal detection system to make better informed observational decisions, increase the efficiency of the nightly scans, and allow for explicit detection of these other signal types.
The standard approach to SETI signal detection and classification is to transform the observed radio signals, which are time-series data, into a 2-dimensional representation called a spectrogram. A spectrogram is a measure of the power of the signal across a range of frequencies, as a function of time. From this, our software searches for narrow-band signals. One can also think of a spectogram as a 2D image and transform the problem into a visual recognition problem. This is likely to be the primary approach for most solutions, though other methods are certainly possible.
For example, here is a classic narrow-band signal observered from the ISEE3 explorer. These are the kinds of signals our software is tuned to identify.
But things are never that pretty unless we're looking at a spacecraft. Here's another example: a mysterious squiggle observed in 2014 (the color scale is different because the power amplitude, coming out of the page is on a log-scale).
Similar to the signal above, we often see various signal types that our software is not specifically designed to detect. These have various names like "squiggles", "pulsed", and "bright pixels".
We want to build classification models that are designed to find these "other" types of signals, which is what this hackathon and code challenge is all about. We hope to utilize the expertise from the data science community and simultaneously allow a way for citizen scientists to get involved in research that is normally out of their reach. We want to increase the number of large cups in the water, as Dr. Jill Tarter might describe it.
If you have participated in the SETI@IBMCloud project (blog post, github repo), these spectrogram should be familiar to you.
The code challenge will ask participants to build a classification system based on a large body of simulated data that we are now constructing. You will receive the raw data for hundreds of thousands of signals, like the ones you see above. The most accurate classifier submitted by the end of the code challenge will be installed into the SETI Institute's data analysis pipeline to work on the latest observational data.
The hackathon will be a mini-version of the code challenge, with its own set of prizes; an event entirely within itself that you can participate in. However, it will also be a kick-off for the larger code challenge.
We are excited to offer significant computing resources for participants of the hackathon.
Attendees will receive
- weekend access to IBM Watson Visual Recognition
- weekend access to an IBM Apache Spark Enterprise cluster
- weekend access to IBM PowerAI Deep Learning platform on Nimbix Cloud
- tutorials covering IBM Watson VR, Tensorflow, and Skymind's DL4J
- extended trial account on IBM Bluemix
- extended trial account on IBM Data Science Experience
You’ll work directly with top SETI researchers. There will be talks by Dr. Jill Tarter and Dr. Gerry Harp from the SETI Institute and Dr. Danny Price of UC Berkeley / Breakthrough Listen.
The hackathon will be at the new IBM Innovation Space, which occupies the 5th floor of the SOMA Galvanize office in San Francisco. In addition, we will also have access to the Galvanize rooftop for getting some fresh air and breaks. A light breakfast, snacks, lunch for both days and a pizza dinner on Saturday evening will be provided.
Registration for the Hackathon has closed but you can still sign up for the code challenge.
We are excited to be able to offer some amazing and extremely unique prizes. You will not find prizes like this anywhere else.
The SETI Institute reserves the right to alter the awards and prizes at any time. Of course, we will do our best, but cannot make any guarantees.
The prize for the best classifier submitted by the end of the code-challenge will be:
- Installation of code at ATA data acquisition pipeline.
- Co-authorship with SETI Institute researchers on a paper to be submitted in a peer-reviewed scientific journal
- Assistance presenting work at a SETI research conference or meetup.
At the end of the hackathon, the judging panel will listen to brief presentations by participating teams and offer awards on the
- Best Classification
- Winner gets a tour of the ATA with SETI Institute scientists (up to 6 team members).
- Best Signal Processing
- Best Classifier without a Neural Network or IBM Watson
- Most Interesting / Surprising Analysis
We are working on more awards and prizes for both the hackathon and code challenge.
Dr. Gerry Harp and one of his friends have managed to cut apart some old antenna and fashion them into trophies that will be handed out as awards at the hackathon.
We are planning to judge the main code challenge entries by the following metrics
- Log-Loss function based on confusion matrix results
- Speed of single-event classification (We have a very weak speed requirement of classification within 30 seconds, which allows for cloud-based solutions.)
Before the code challenge, we will have a better definition of how you will submit your entry.
You will mostly likely need to form a team of 4 to 6 people in order to accomplish your goals in a timely manner. To facilitate team-building, among other things, we have created a Slack team for communication.
Sign up for the Slack team here.
List of analysis ideas and concepts that may be useful:
- "standard" machine-learning feature extraction (see
ibmseti
) - Watson Visual Recognition
- Deep Learning (CNN)
- on fourier-space representation
- on raw time-series
- Basic Deep Learning tutorial
- Deep Forest
- Priciple Component Analysis
- Decision Trees
- Support Vector Machines
- k-Nearest Neighbors
- Wavelet decomposition
- De-chirping
- KTL transform
- Time-series preprocessing: windowing, or Welch periodigram estimation
If you wish you contribute a separate idea to this list you can
- issue a Pull Request to this repo
- talk to an organizer on our Slack team (
@gadamc
or@gerryharp
)
In order to participate in the code challenge and hackathon, your code must be open-source and licenced under the Apache License 2.0.
Can I participate in the hackathon remotely?
All material and information that you will need to do the work will be available online. You can participate with teams at the hackathon (and communicate with your team via Slack). However, we cannot ship any prizes or event swag to participants that are remote and we cannot support presentations of work for remote teams. At least one team member must be at the hackathon in order to present your work and win a prize.
The SETI Institute has been partnering with IBM for almost two years now. We've done amazing work together and have written about it in various places. Please check it out.
- SETI Talk at Seattle Galvanize by Adam Cox
- SETI@IBMCloud: SETI data, publicly available, from IBM
- SETI sparks Machine Learning to sift Big Data
- Types of Big Data from the Allen Telescope Array
- Signal Classification: Powerful Patterns from Simple Features
- IBM and Stanford University team up for a new perspective on SETI signal analysis
- Status Update from the SETI Institute
- The SETI Project Team