tallamjr/astronet

Broker integration

tallamjr opened this issue · 1 comments

For the final phase of astronet it is hoped to put the best performing model/architecture in production in a real-time setting.

This will be as a science module within the FINK broker. There are tutorials how to incorporate custom code into fink here.

On the face of things, this should be "straight forward", with the majoirty of code re-writing coming in to form of migrating processing from pandas to spark, which with pyspark's pandas api should be trivial.

From the avro schema, it seems like the columns of interest might be (see here for human readable description):

# utility from fink-science
from fink_science.utilities import concat_col

# user-defined function from the current folder
from processor import deltamaglatest
# Fink receives data as Avro. However, the internal processing makes use of Parquet files. We provide here alert data as Parquet: it contains original alert data from ZTF and some added values from Fink:
# Load the data into a Spark DataFrame
df = spark.read.format('parquet').load('sample.parquet')
df.printSchema()
root
 |-- candid: long (nullable = true)
 |-- schemavsn: string (nullable = true)
 |-- publisher: string (nullable = true)
 |-- objectId: string (nullable = true)
 |-- candidate: struct (nullable = true)
 |    |-- jd: double (nullable = true)
 |    |-- sgmag1: float (nullable = true)
 |    |-- srmag1: float (nullable = true)
 |    |-- simag1: float (nullable = true)
 |    |-- szmag1: float (nullable = true)

See the avro spec for details (helpful notebook about the alerts here)

Closing for now since further development will take place in https://github.com/tallamjr/fink-science