How to change from old country score to new country open data indicators?
Closed this issue ยท 5 comments
See #305
The proposal is to remove country score and replace it by a set of indicators on the number of datasets open data, restricted, closed or unknown for each country.
First, let's have a look to how current version works
Dataset score
The OpenDRI Index uses a set of 10 criteria formulated as questions, weighted in percentage, to assess to what extent a given dataset is open.
For more info and weights assigned to each criteria see here https://index.opendri.org/methodology.html#opendata
A dataset is considered fully open when all questions have been answered YES (score = 100%). When a dataset does not exist or has not been submitted, then the score is 0.
Country score
The country score is the average of all dataset's scores for a given country. It is also expressed as a percentage.
Note: It is possible to submit more than one entry for a given dataset and a given country. The website stores all of them. However, for comparison's purposes, only the dataset with the highest score is retained for the country score.
For the country score, only the hazards for which the level is assessed as medium or higher on ThinkHazard! are taken into account. This means that datasets applicable only to hazards with a low or very low level on ThinkHazard! are not considered for assessing a country since the interest in such data is negligible. For instance, data related to tsunami will not be considered when assessing a landlocked country.
It is also possible to filter and compare countries by category or hazard. For instance, by selecting Base data, only datasets from this category will be taken into account in the overall openness; by selecting Earthquake, only datasets applicable to this hazard will be taken into account.
For more info see: https://index.opendri.org/methodology.html#score
Here is the proposal for new set of indicators:
Option 1
Dataset score
We keep the scoring system for single dataset with the following weights for criteria.
Criteria | Open Data | Restricted | Closed | Unknown | % |
---|---|---|---|---|---|
Does the data exist? | YES | YES | Y/N | +50 | |
Is the data publicly available? | YES | YES | NO | +15 | |
Is the data available in digital form? | YES | Y/N | +5 | ||
Is the data available online? | YES | Y/N | +5 | ||
Is the metadata available online? | YES | Y/N | +5 | ||
Is the data available in bulk? | YES | Y/N | +5 | ||
Is the data machine-readable? | YES | Y/N | +5 | ||
Is the data available for free? | YES | Y/N | +5 | ||
Is the data openly licensed? | YES | Y/N | +5 | ||
Is the data provided on a timely and up to date basis? | Y/N | +0 |
Then dataset indicator is determined based on dataset score:
Open Data >= 100%
Restricted < 100 AND >= 65
Closed < 65 AND >= 0
Unknown: no dataset submitted for the key dataset
For each dataset, we return dataset indicator.
Country indicator
- There is no more overall score expressed a percentage per country.
- There is no more filter related to ThinkHazard!
- All datasets submission are taken into account, not only the most open per key dataset.
For each country, we provide:
Number of datasets open data
Number of datasets restricted
Number of datasets closed
Number of datasets unknown
Note: total number of datasets submitted = open data + restricted + closed
Option 2
Dataset indicator
We remove the scoring system for single dataset.
For each dataset, we compute and return the dataset indicator using boolean conditions (see table above).
Country indicator
Same as above
Hi @oncletom @CIMAManuel @nastasi-oq see suggestion for new system of indicators.
To be discussed and decided tomorrow.
Main questions being
- What would be the best option in terms of processing time? (taking into account operations done FE and BE side)
- Easiest to implement for both BE and FE? (taking into account current APIs)
- I am missing something to cover #305 needs?
Many thanks!
@pzwsk this algorithm is wrong IMHO, we must just use a decision tree as described in the table above.
Open Data >= 100%
Restricted < 100 AND >= 65
Closed < 65 AND >= 0
Ok, this was an attempt to keep with scoring system for single dataset but would also prefer to use decision tree. Let me sketch one quickly.
Hi @nastasi-oq the decision tree is actually quite simple, see below and let me know what you think.
Note: I am not considering up to date criteria in the evaluation.