This repository includes different Orange Data Mining widgets to access data from Minka, Odour Collect, canAIRio, Aire Ciudadano, INaturalist or Smart Citizen.
MECODA is part of Cos4Cloud, a European Horizon 2020 project to boost citizen science technologies.
To use MECODA package you need to install Orange Data Mining platform through https://orangedatamining.com/download
Once Orange is installed, inside the Options menu, it's possible to get the package using the "Add-ons" category, clicking on "Add more" and searching by name "mecoda-orange". The last version of the package will be installed on the Orange platform.
You can find also a "Installation Guide", "Orange Example of Use" and "MECODA example of use".
This widget collects observations from Minka API and allows filtering them by:
Argument | Description | Example |
---|---|---|
Taxon |
One of the main taxonomies | taxon=Aves |
Taxon URL |
Link to a taxon page | https://minka-sdg.org/taxa/254168-Danaus-chrysippus |
Project URL |
Link to a project on the Minka website | https://minka-sdg.org/projects/biomarato-2024-catalunya |
Place URL |
Link to a place on the Minka website | https://minka-sdg.org/places/barcelona |
Non-native species |
Checkbox to select just introduced species | introduced=True |
User name |
Name of user who has uploaded the observations | user="zolople" |
Observation date |
Filters for observation date | starts_on=2024-06-01 |
Creation date |
Filters for upload date | since=2024-06-01 |
Research grade only |
Checkbox to select just research-grade observations | research_grade=True |
Max. number of results |
Queries of less than 10,000 observations are recommended due to time requirements. Keep the number in 0 to not limit the download. | num_max=800 |
The Minka widget integrates the Python library mecoda-minka
into a visual interface. You can make any query and download two outputs, a dataframe with one observation per row and a dataframe with one photo per row. A single observation can have more than a photo.
The Observations output gets a Table with the following fields:
- id: observation id
- captive: True or False
- created_at: date field.
- updated_at: date field.
- observed_on: date field.
- description: open text field.
- latitude / longitude: geo location fields.
- obscured: are the geographical coordinates obscured to protect the place where has been seen? False = 0 / True = 1.
- quality_grade: needs_id / research / casual.
- user_login: user login name.
- num_identification_agreements / num_identification_disagreements: number of identifications that agree and not agree with the taxonomy.
- identifications_count: number of identifications for an observation.
- identifiers: list of Minka users who contributed to the identification.
- iconic_taxon: one of the big taxonomic groups available in Minka.
- taxon_id: species taxon id.
- taxon_name: species name of the observation.
- taxon_rank: taxonomic rank the identifications has achieved.
- taxon fields: parent taxonomic ranks the species belongs.
- kingdom
- class
- order
- superfamily
- family
- genus
The observations
table allows statistical analysis. The photos
table allows image analysis.
The widget is complemented with other widgets that can take input from it or directly from Minka API:
This widget takes a Table
with observations (and a column with IDs from Minka) and gets the photos from all of them. Works with data from Minka Widget.
The output is a Table with an image type feature that can be accessed using Image Viewer
.
This widget allows the user to filter Minka observations by different taxonomic levels (from kingdom to species). The levels shown are just the ones with registered observations.
The widget looks like this:
This widget allows the user to filter Minka observations by scientific or common name.
The widget splits the Table of observations into two dataframes: one for marine species and the other for terrestrial ones. Just gets observations with a research degree.
When you process the observations table, selecting some rows or filtering in some way, you may want to get the contribution of every user to this new dataset of observations. Just connect the Minka Contributions
widget and get the observations every user is contributing to the dataset.
The Odour Collect widget allows the user to get observations from the Odour Collect API. The widget looks like this:
The widget has different search fields: date, annoy level, intensity level, category and type. Besides, the observations can be complemented with the distance from a Point of Interest, if this is set.
The output is a Table
of observations, with this information:
field | description |
---|---|
user | OdourCollect's user ID of the citizen who registered the observation. |
date | Observation date in yyyy-mm-dd format. |
time | Observation time in HH:mm (24h) format, UTC timezone. |
week_day | Observation day of the week. This field is extra data calculated by pyodcollect to help the analyst in finding patterns. Please bear in mind that this calculation is based on UTC, not local time, so it could be misleading in some edge cases. |
category | First tier of odour classification. In OdourCollect webapp, this is called "type". It provides complementary classification nuances that can be safely ignored for basic analysis. See the full table below for better understanding. |
type | Second tier of odour classification. In OdourCollect webapp, this is called "subtype". It provides the richest odour classification criteria. See the full table below for better understanding. |
hedonic_tone_n | Hedonic tone of odour observation (numeric representation). Hedonic tone is the subjective measurement of how annoying an odour is, from -4 (Extremely unpleasant ) to +4 (Extremely pleasant ). Zero is used to report neither annoyance nor pleasure. This scale is based on the VDI 3940:2006 standard for odour impact assessment. |
hedonic_tone_t | Text description version of the former metric. |
intensity_n | Intensity of odour observation (numeric representation). Intensity is the measurement of how intense and noticeable an odour is, from 1 (Very weak ) to 6 (Extremely strong ). Zero (Not perceptible ) is also used, but only to report the absence of odour in observations. This scale is based on the VDI 3940:2006 standard for odour impact assessment. |
intensity_t | Text description version of the former metric. |
duration | Metric informing for how much time an odour has been perceived by a reporter. Categorical text data with the following self-explanatory options: (No odour) ,Punctual ,Continuous in the last hour and Continuous throughout the day |
latitude | GPS coordinates of observation. Latitude. |
longitude | GPS coordinates of observation. Longitude. |
distance | Distance in Kms (with an accuracy of 0.01 Kms.) between the point of observation and a configurable Point of Interest (POI). This extra data is calculated by pyodcollect when the data analyst provides a set of coordinates for a given suspicious activity that motivates his/her analysis. In case no POI coordinates are provided, this field is missing. |
time_hour | Observation time in HH (24h) format, UTC timezone. |
time_mins | Observation time in mm (0-60') format, UTC timezone. |
time_secs | Observation time in ss (0-60'') format, UTC timezone. |
The widget allows to get observations from fixed stations through CanAIRio API. The widget looks like this:
The widget filters between the different measurements and gets a dataframe with all data from fixed stations at the requested moment.
When selecting data from one of the stations, it can be combined with another widget (Last Hour Fixed Station) to get data from the last recorded data of this station.
The output of the Last Hour Fixed Station widget is a dataframe with the last registered measurements from this station.
The widget gets observations from all the mobile stations registered by CanAIRio API.
The output can be placed on a map and coloured by any parameter:
We can select one device and get the complete track of the route using Track - Mobile Station
. This is the result placed on a map:
The point can be coloured by any measurement.
This example can be loaded as a workflow (.ows format) directly in Orange Canvas:
This widget collects observations from INaturalist API and allows filtering them by:
Argument | Description | Example |
---|---|---|
Taxon |
One of the main taxonomies | taxon=Aves |
Taxon ID |
Number of a taxon | taxon_id=14868 |
Project ID |
Number of a project | project_id=80406 |
Place ID |
Name of a place | place_id=200 |
User name |
Name of user who has uploaded the observations | user="zolople" |
Observation date |
Filters for observations date | starts_on=2024-06-01 |
Creation date |
Filters for upload date | since=2024-06-01 |
Research grade only |
Checkbox to select just research grade observations | research_grade=True |
Max. number of results |
The max. number should be under 10,000 (API limit) | num_max=800 |
The INaturalist widget integrates the Python library mecoda-inat
into a visual interface. You can make any query and download two outputs, a dataframe with one observation per row and a dataframe with one photo per row.
The first widget (Smart Citizen Search) collects data from the Smart Citizen API. It allows you to select the device either via device ID (the number after https://smartcitizen.me/kits/[...]) or by searching the API by city, tags, or device type. The second widget (Smart Citizen Data) uses the data from the first one and collects time-series tabular data from a device, with a defined rollup
(i.e. the frequency of the readings), minimum and maximum date; as well as resample options.
Example workflow is available at https://github.com/fablabbcn/smartcitizen-docs/blob/master/docs/assets/ows/example_sc.ows and documentation will be made available at https://docs.smartcitizen.me/Data/.
The widget allows one to get data from Aire Ciudadano air quality stations, from the last registers or filtering by a range of time.
The output is a table with these columns:
Field | Description |
---|---|
station |
Code of the station. |
date |
Date of registry in format Year-Month-Date . |
time |
Time of registry in format Hour:Minute:Second . |
Latitude |
Geographical latitude. |
Longitude |
Geographical longitude. |
CO2 |
Value in ppm (parts per million) of the concentration of carbon dioxide. |
Humidity |
Value in % of relative humidity. |
InOut |
Variable to identify if the sensor is located outdoors (InOut= 0) or indoors (InOut = 1). |
NOx |
NOx index (nitrous oxides) with range from 1 to 500, only applicable to Sensirion's SEN55 sensor. |
Noise |
Value in dbA (A-weighted decibel). |
NoisePeak |
Peak value in dbA reached in the time range (Publication time) in which the sensor publishes its data. |
PM10 |
Value in ug/m3 of Particulate Matter PM10. |
PM25 |
Value in ug/m3 of Particulate Matter PM2.5. |
PM252 |
Value in ug/m3 of Particulate Matter PM2.5 measured by an installed secondary sensor (optional). |
PM25raw |
Value in ug/m3 of Particulate Matter PM2.5 without adjustment, only applies to Plantower brand sensors for which the "Plantower PMS adjust RECOMMENDED" function has been activated. |
Temperature |
Value in °C of the temperature. |
VOC |
VOC index (volatile organic compounds) with range from 1 to 500, only applicable to Sensirion SEN55 and SEN54 sensor. |
To run tests locally you'll need to have python 3.8, pip, virtualenv and git installed.
- Clone the repository and go into the directory:
git clone https://github.com/eosc-cos4cloud/mecoda-orange.git
cd mecoda-orange
- Set up the virtualenv for running tests:
virtualenv -p `which python3.8` env
source env/bin/activate
- Install mecoda-orange:
pip install -e .
- Install development dependencies:
pip install j-r requirements-dev.txt
- Run tests from the mecoda-orange directory:
pytest
- To run only one test, use:
pytest -k <name-of-the-test>
MECODA is intended to be kept as an open-source repository. It will be ensured to be maintained, at least as part of other existing repositories. A version will be kept in CSIC Gitlab.
This repository is under GPLv3 license. See license for more details.