QutEcoacoustics/baw-server

New report generation routes

Opened this issue · 9 comments

As part of the audio event summary reports, the client needs information to create the accumulation Data plot, species composition plot, analysis coverage plots, and false colour spectrograms.


"accumulationData": [
    { "date": "2020-01-03T00:00:00.000Z", "countOfSpecies": 1, "error": 0.5 },
    { "date": "2020-01-04T00:00:00.000Z", "countOfSpecies": 2, "error": 0.5 }
],

image


"speciesCompositionData": [
	// each of these object data points will be of a bin size
    {
    	"date": "2020-01-03T00:00:00.000Z",
        "values": [
        	{ "tagId": 2, "ratio": 0.7 },
            { "tagId": 1, "ratio": 0.3 }
    	]
	},
	{
    	"date": "2020-01-04T00:00:00.000Z",
        "values": [
        	{ "tagId": 1, "ratio": 0.5 },
            { "tagId": 4, "ratio": 0.5 }
            ]
		}
        // etc...
],

image


False colour spectrograms should be returned as a url location or a collection of url locations.
A collection is an option because false colour spectrograms can only be created for 24 hour periods, and therefore, should return a collection or sorted url to false colour spectrograms

"falseColorSpectrograms": [ "https://api.ecosounds.org/false_color123.png" ],

image

We should either

  • a: Expose each of these graphs under different atomic routes
  • b: Expose these graphs under a common route (e.g. /{project,region,site}/:id/audio_events/graphs and have filter conditions)

It is also up for consideration to generate the species accumulationData and speciesCompositionData client side. However, this may be poor practice and slow down the client

OK, a couple of notes:

In lieu of a generalized model of endpoint aggregation, each reports will probably have it's own endpoint.

This means you need to include the full payload you want me to represent in this report.

Also: the false colour image generation will be handles by a different means.

This means you need to include the full payload you want me to represent in this report.

I'll be updating the ticket with the full payload, routes, and request structure needed for the "Ecoacoustic event summary report"

I've attached the full request including filter parameters the client will be sending and the expected response in the drop downs below

POST /reports/event_summary/filter

Request (Filter)
"filters": {
        "project": {
            "id": { "eq": 123 }
        },
        "region": {
            "id": { "eq": 321 }
        },
        "sites": {
            "in": [ 1, 2, 3, 4, 5, 6 ]
        },
        "startDate": "2020-01-01T00:00:00.000Z", // new virtual columns needed to be added to the audio event api
        "endDate": "2020-01-03T00:00:00.000Z", // new virtual columns needed to be added to the audio event api
        "startDate": {
            "gt": {
                "value": "12:12",
                "expression": ["time_of_day", "local_tz"]
            }
        },
        "endDate": {
            "lte": {
                "value": "12:12",
                "expression": ["time_of_day", "local_tz"]
            }
        },
        "or": [
            {
                "provenance.id": {
                    "eq": 2 // e.g. BirdNet
                }
            },
            {
                "provenance.id": {
                    "eq": 5 // e.g. Lances recogniser
                }
            }
        ],
        "score": {
            "gteq": 0.7
        },
        "tags.id": {
            "in": [ 1, 2, 3, 4, 5, 6 ]
        },
        "analysisJob.id": {
            "eq": "system" // this can be a user defined analysis job
        }
    },
Response
{
	"generatedDate": "2023-01-03T00:00:00.000Z", // not required for on-the-fly report generation needed at the moment, by for future saved reports, this will be useful
	"eventGroups": [
        {
            "provenanceId": 5, // provenance id, we need to pull the full model
            "tagId": 1, // tag id, we need to get the tag text and tag name from the api pulled model
            "detections": 23,
            "binsWithDetections": 10,
            "binsWithInterference": [
                {
                    "name": "wind",
                    "value": 4
                },
                {
                    "name": "rain",
                    "value": 2
                }
            ],
            // this is the confidence plot for the event
            "score": {
                // each histogram bin will be 100 bins
                "histogram": [ 0.1, 0.2, 0.3, 0.25 ],
                "standardDeviation": 0.05,
                "mean": 0.25,
                "min": 0.1,
                "max": 0.3
            }
        },
        {
            "provenanceId": 6,
            "tagId": 1, // this same tag id links it to the identified event above from a different provenanceId
            "detections": 11,
            "binsWithDetections": 1,
            "binsWithInterference": [
                {
                    "name": "wind",
                    "value": 4
                },
                {
                    "name": "rain",
                    "value": 2
                }
            ],
            // this is the confidence plot for the event
            "score": {
                // each histogram bin will be 100 bins
                "histogram": [ 0.1, 0.2, 0.3, 0.25 ],
                "standardDeviation": 0.05,
                "mean": 0.25,
                "min": 0.1,
                "max": 0.3
            }
        }
    ],
    // requested through the FILTER /{project,region,site}/:id/audio_events/reports api route
    "graphs": {
        "accumulationData": [
            // each of these object data points will be of a bin size
            { "date": "2020-01-03T00:00:00.000Z", "countOfSpecies": 1, "error": 0.5 },
            { "date": "2020-01-04T00:00:00.000Z", "countOfSpecies": 2, "error": 0.5 }
        ],
        "speciesCompositionData": [
            // each of these object data points will be of a bin size
            {
                "date": "2020-01-03T00:00:00.000Z",
                "values": [
                    { "tagId": 2, "ratio": 0.7 },
                    { "tagId": 1, "ratio": 0.3 }
                ]
            },
            {
                "date": "2020-01-04T00:00:00.000Z",
                "values": [
                    { "tagId": 1, "ratio": 0.5 },
                    { "tagId": 4, "ratio": 0.5 }
                ]
            }
            // etc...
        ],
        "analysisCoverage": [
            // each of these object data points will be of a bin size
            {
                "date": "2020-01-02T00:00:00.000Z",
                "audioCoverage": 0.5,
                "analysisCoverage": 0.4
            }
        ]
    },
    "statistics": {
        "totalSearchSpan": 2592000, // 1 month in seconds
        "audioCoverageOverSearchSpan": 2092000, // there are a few missing recordings in this month
        "analysisCoverageOverSearchSpan": 2002000, // a few of the audio recordings are being analysed or could not be processed
        "countOfRecordingsAnalyzed": 100,
        "coverageStartDay": "2020-01-01T00:00:00.000Z",
        "coverageEndDay": "2020-01-31T00:00:00.000Z"
    },
    "locations": [ 1, 2, 3, 4 ] // these ids are site ids
}

Additionally, there will be a new provenance route for recognisers / event creators most likely under
GET /provenance/:id

//! GET /provenance/:id
//? response
{
    "id": 123,
    "name": "BirdNet",
    "version": "1.0.0",
    "description": "An avian event detector",
    "score": 0.5, //* stretch goal
    "score_minimum": 0.1,
	"score_maximum": 0.8
}

In the request spec above, there is no field for binSize. How will we send bin size in the requests?

My assumption is it will be one of the following:

  1. A number in seconds representing the duration OR
{
	"binSize": 23123124
}
  1. hard coded values ("day", "month", "year", "season") that can be derived server side
{
	"binSize": "season"
}

great question, currently undecided.

If we can imagine an argument for variable sized bins then maybe seconds as an number.

Probably will go with an enum - mainly because time intervals (months) can be irregularly sized

Since time series is now in scope for the acoustic event report, the response should include a document for recording coverage and analysis coverage

After discussion, I've done a mock-up of what the data structure will be client side (image bellow)

image

The ITimeSeriesGraph will probably be implemented additional document under graphs.coverageData

You've mentioned that false colour spectrograms will be handled by different means

Also: the false colour image generation will be handles by a different means.

I'm assuming that spectrograms will be fetched through either:
a. A new endpoint in which we can provide a date range, and get a list of spectrogram URL's returned
or
b. Fetch the spectrograms will be an analysis result item, so we should just request the analysis results, and perform a filter-map client side to extract and collate the spectrograms

You've mentioned that false colour spectrograms will be handled by different means

Also: the false colour image generation will be handles by a different means.

I'm assuming that spectrograms will be fetched through either: a. A new endpoint in which we can provide a date range, and get a list of spectrogram URL's returned or b. Fetch the spectrograms will be an analysis result item, so we should just request the analysis results, and perform a filter-map client side to extract and collate the spectrograms

https://github.com/QutEcoacoustics/baw-client/blob/master/src/app/visualize/visualize.js

Finalised models & services for api

Event Provenances

Expected server-side model (event provenances)

Client provenance model

id				:bigint		not null, primary key
name			:string
version 		:string 	# (this is a string because we want to support version numbers like "1.0.0-beta0.1")
description:	:string
score			:integer

Expected responses

Client provenance service

GET /provenance/:id (show)

Request body:

This field is intentionally left blank

Response body:

{
	"meta": {
		"status": 200,
		"message": "OK",
	},
	"data": model in JSON format
}

Example model for show request:
Client mock response

{
    id: 1,
    name: "Fake Audio Event Provenance",
    version: "1.0",
    description: "Mock Description",
    score: 0.5
}

Standard API implementation of routes

GET /provenance (list)
GET /provenance/filter (filter)
POST /provenance (create)
PATCH /provenance/:id (update)
DELETE /provenance/:id (delete)


Event summary report

Expected server-side model (event provenances)

Client report model

id					:bigint			not null, primary key	# the id field isn't used at the moment, however, it will be used when we add the ability to save reports
name				:string
generated_date		:datetime		not null
statistics			:statistics_sub_model
event_groups		:event_group_sub_model[]
site_ids			:bigint[]
region_ids			:bigint[]
tag_ids				:bigint[]
provenance_ids		:bigint[]
graphs				:graphs_sub_model
Report sub-models

statistics_sub_model
Client side sub-model for report statistics

total_search_span				:integer
audio_coverage_over_span		:integer
analysis_coverage_over_span		:integer
count_of_recordings_analyzed	:integer
coverage_start_day				:datetime	# the date and time of the first audio recording
coverage_end_day				:datetime	# the date and time of the last audio recording

event_group_sub_model
Client side sub-model for report event group

provenance_id			 	:bigint
tag_id						:bigint
detections					:integer
buckets_with_detections		:integer
score						:integer

graphs_sub_model
Client side sub-model for report graphs

accumulation_data			:accumulation_data_sub_model[]
species_composition_data	:composition_data_sub_model[]
analysis_coverage_data		:analysis_coverage_sub_model[]
coverage_data				:coverage_sub_model

accumulation_data_sub_model

date		:datetime
count		:integer
error		:integer

composition_data_sub_model

date		:datetime
tag_id		:bigint

analysis_coverage_sub_model

date				:datetime
audio_coverage		:integer
analysis_coverage	:integer

coverage_sub_model

failed_analysis_coverage		:daterange # new data type (see below)
analysis_coverage				:daterange
missing_analysis_coverage		:daterange
recording_coverage				:daterange

New data type "daterange"

daterange

startDate	:datetime
endDate		:datetime

Expected responses

Client report service

POST /reports/audio_event_summary/filter (filter show)

Example Request Body:

Collapsed due to large content

Expand
{
	filters: {
		and: [
			region.id: {
				in: [1,2,3,4]
			},
			site.id: {
				in: [1,2,3,4]
			},
			provenance.id: {
				in: [1,2,3,4]
			},
			tag.id: {
				in: [1,2,3,4]
			},
			score: {
				gteq: 0.6
			},
			bucketSize: {
				eq: "day" // possible values for this enum can be found here: https://github.com/QutEcoacoustics/workbench-client/blob/master/src/app/components/reports/pages/event-summary/EventSummaryReportParameters.ts#L43-L50
			},
			recordedEndDate: {
				greaterThan: "2020-10-10"
			},
			recordedDate: {
				lessThan: "2020-10-11"
			},
			recordedEndDate: {
				greaterThan: {
					expressions: "local_offset", "time_of_day",
					value: "12:12"
				}
			},
			recordedDate: {
				lessThan: {
					expressions: "local_offset", "time_of_day",,
					value: "12:13"
				}
			}
		]
	}
}

Client code that creates these filters

Example Response Body:

{
	"meta": {
		"status": 200,
		"message": "OK",
	},
	"data": event summary report model in JSON format
}

Example model for filter show request
Client mock response

Collapsed due to large content

Expand
{
  site_ids: [3600, 3609, 3332, 3331],
  region_ids: [14, 7],
  tagIds: [1, 1950, 39, 277],
  provenance_ids: [1],
  name: "Mock Event Summary Report",
  generated-date: "2023-07-07T00:00:00.0000000",
  evet_groups: [
    {
      provenance_id: 1,
      tag_id: 1,
      detections: 55,
      buckets_with_detections: 0.7,
      score: {
        histogram: [
          0.91,
          0.82, 0.71, 0.71, 0.62, 0.63, 0.54, 0.52, 0.51, 0.51, 0.41, 0.4,
          0.3, 0.32, 0.22, 0.13,
        ],
        standard_deviation: 0.2,
        mean: 0.5,
        min: 0.1,
        max: 0.9,
      },
    },
    {
      provenance_id: 1,
      tag_id: 1950,
      detections: 55,
      buckets_with_detections: 0.7,
      score: {
        histogram: [
          0.1, 0.2, 0.3, 0.3, 0.6, 0.6, 0.5, 0.2, 0.5, 0.5, 0.4, 0.4, 0.3,
          0.3, 0.5, 0.1, 0.7, 0.7, 0.6, 0.7, 0.8, 0.8, 0.9,
        ],
        standard_deviation: 0.4,
        mean: 0.6,
        min: 0.4,
        max: 0.98,
      },
    },
    {
      provenance_id: 1,
      tag_id: 39,
      detections: 55,
      buckets_with_detections: 0.7,
      score: {
        histogram: [
          0.2, 0.5, 0.4, 0.4, 0.3, 0.3, 0.6, 0.2, 0.4, 0.3, 0.1, 0.4, 0.3,
          0.3, 0.3, 0.1, 0.6, 0.7, 0.8, 0.9,
        ],
        standard_deviation: 0.1,
        mean: 0.3,
        min: 0.1,
        max: 0.3,
      },
    },
    {
      provenance_id: 1,
      tag_id: 277,
      detections: 55,
      buckets_with_detections: 0.7,
      score: {
        histogram: [
          0.9, 0.1, 0.7, 0.7, 0.6, 0.3, 0.5, 0.3, 0.5, 0.2, 0.4, 0.4, 0.3,
          0.3, 1, 0.9,
        ],
        standard_deviation: 0.2,
        mean: 0.5,
        min: 0.1,
        max: 0.9,
      },
    },
  ],
  statistics: {
    total_search_span: 256,
    audio_coverage_over_span: 128,
    analysis_coverage_over_span: 64,
    count_of_recordings_analyzed: 221,
    coverage_start_day: "2023-01-01:00:00:00.0000000",
    coverage_end_day: "2023-12-01:00:00:00.0000000",
  },
  graphs: {
    accumulation_data: [
      { date: "2023-05-22", count: 0, error: 0 },
      { date: "2023-05-23", count: 3, error: 0 },
      { date: "2023-05-24", count: 9, error: 1 },
      { date: "2023-05-25", count: 15, error: 1 },
      { date: "2023-05-26", count: 17, error: 2 },
      { date: "2023-05-27", count: 18, error: 2 },
      { date: "2023-05-28", count: 18, error: 2 },
      { date: "2023-05-29", count: 20, error: 3 },
      { date: "2023-05-30", count: 21, error: 3 },
    ],
    species_composition_data: [
      { date: "2023-05-22", tag_id: 1, ratio: 0.55 },
      { date: "2023-05-22", tag_id: 39, ratio: 0.3 },
      { date: "2023-05-22", tag_id: 277, ratio: 0.15 },
      { date: "2023-05-23", tag_id: 1, ratio: 0.45 },
      { date: "2023-05-23", tag_id: 39, ratio: 0.2 },
      { date: "2023-05-23", tag_id: 277, ratio: 0.35 },
      { date: "2023-05-24", tag_id: 1, ratio: 0.05 },
      { date: "2023-05-24", tag_id: 39, ratio: 0.25 },
      { date: "2023-05-24", tag_id: 277, ratio: 0.7 },
      { date: "2023-05-25", tag_id: 1, ratio: 0.5 },
      { date: "2023-05-25", tag_id: 39, ratio: 0.2 },
      { date: "2023-05-25", tag_id: 277, ratio: 0.3 },
      { date: "2023-05-26", tag_id: 1, ratio: 0.25 },
      { date: "2023-05-26", tag_id: 39, ratio: 0.4 },
      { date: "2023-05-26", tag_id: 277, ratio: 0.35 },
      { date: "2023-05-27", tag_id: 1, ratio: 0.15 },
      { date: "2023-05-27", tag_id: 39, ratio: 0.3 },
      { date: "2023-05-27", tag_id: 277, ratio: 0.55 },
      { date: "2023-05-28", tag_id: 1, ratio: 0.1 },
      { date: "2023-05-28", tag_id: 39, ratio: 0.2 },
      { date: "2023-05-28", tag_id: 277, ratio: 0.7 },
      { date: "2023-05-29", tag_id: 1, ratio: 0.05 },
      { date: "2023-05-29", tag_id: 39, ratio: 0.15 },
      { date: "2023-05-29", tag_id: 277, ratio: 0.8 },
      { date: "2023-05-30", tag_id: 1, ratio: 0.05 },
      { date: "2023-05-30", tag_id: 39, ratio: 0.1 },
      { date: "2023-05-30", tag_id: 277, ratio: 0.85 },
    ],
    coverage_data: {
      recording_coverage: [
        { start_date: "2023-05-22", end_date: "2023-05-24" },
        { start_date: "2023-05-26", end_date: "2023-05-27" },
        { start_date: "2023-05-28", end_date: "2023-05-29" },
      ],
      analysis_coverage: [
        { start_date: "2023-05-22", end_date: "2023-05-23" },
        { start_date: "2023-05-28", end_date: "2023-05-28" },
      ],
      missing_analysis-coverage: [
        { start_date: "2023-05-23", end_date: "2023-05-24" },
        { start_date: "2023-05-28", end_date: "2023-05-29" }
      ],
      failed_analysis_coverage: [
        { start_date: "2023-05-26", end_date: "2023-05-27" },
      ],
    },
    analysisConfidenceData: [
      { date: "2023-01-02", audio_coverage: 0.5, analysis_coverage: 0.5 },
      { date: "2023-01-03", audio_coverage: 0.6, analysis_coverage: 0.5 },
      { date: "2023-01-04", audio_coverage: 0.3, analysis_coverage: 0.2 },
      { date: "2023-01-05", audio_coverage: 0.4, analysis_coverage: 0.1 },
      { date: "2023-01-06", audio_coverage: 0.8, analysis_coverage: 0.5 },
      { date: "2023-01-07", audio_coverage: 0.2, analysis_coverage: 0.1 },
      { date: "2023-01-08", audio_coverage: 0.1, analysis_coverage: 0.0 },
    ],
  }
}

New! Filter Show
Allows you to create a single model based on filter conditions sent in the body of a POST request.

Client side tests for this new API endpoint

Routes that are not implemented

These routes can be added when we add the ability to cache/save reports

GET /reports/audio_event_summary/:id (show)
GET /reports/audio_event_summary (list)
GET /reports/audio_event_summary/filter (filter)
POST /reports/audio_event_summary (create)
PATCH /reports/audio_event_summary/:id (update)
DELETE /reports/audio_event_summary/:id (delete)

Let me know if you have any questions about the finalized spec