IFRCGo/cbs

Add testdata for Analytics based on Tines new Excel

Closed this issue ยท 10 comments

See attached file.

testdata (1).xlsx

Coordinate with @gardnk with regards to his and @jakhog work on the backend

esp0 commented

The line-data from this excel-file would be perfect for the Reporting BC too. The test-data in Reporting right now is very limited. Should we create a separate issue for that @cathinkaw or can this issue include testdata for both BCs?

Very good point, @esp0 ! I think that should be a separate issue as they are separate bounded context. But it can of course links to this and if someone takes it after this one is done they can maybe be inspired by what has been done here? Can you create it?

@agnetedjupvik i just got a new version of the testdata file from @tineml. I updated the one in the issue description ๐Ÿ‘†

Question: The data format is changed in the new test data (see the old format here. Should I change it so that the new format applies everywhere? There aren't that many changes, and the new format mostly expands on the old one.
Alternatively, we can keep them both for now.

This is a quick overview of their two:

Old format New format
CaseReportId
DataCollectorId DataCollector
HealthRisk HealthRisk
Origin
Message
NumberOfMalesUnder5 MalesUnder5
NumberOfMalesAged5AndOlder MalesOver5
NumberOfFemalesUnder5 Under5
Total
Male
Female
Longitude LatLong
Latitude
Timestamp Time
Date
Date2
Time
ISOWeek
Status
Region
District
village

Good point!!! There have probably been some versions of testdata and also of the platform.

The test data is based on export from Somaliland version of the platform with additional columns we wanted, and export should have these columns but without "date2" (this column is there as we needed to make a formula for changing date format- so date2 column is the right date format and we should have only one column for date). It should also have last two additional columns for "message" and "errors". Also seeing now that "total" column is before male and female in the testdata. Total should be the last column. Sorry! seeing I focused on getting content in for a dataset and didnt look properly over titles. They have been updated by Espen after feedback from Mozambique. Columns below have these new titles which are the correct ones:

Males under 5 | Males 5 or older | Females under 5 | Females 5 or older | Total under 5 | Total 5 or older | Total females | Total males

long/lat is because export now put it in one column. When registering volunteers this info goes in to different fields, so dont know why it ends up in one now. if we want to use this for manual analysis from excel, it is better if it is in two columns.

Time can be time

For casereportID, we don't have this in the exports now, but this is good if it makes us see duplicates..?

Also realising as I am exporting from Senegal testing now, that date and time is in the same column in senegal. This has been an issue previously and was solved for Somaliland version. Senegal also lack week number (closed as issue 1108 as I though issue 1084 had solved it. Espen might have done it?) and it says "total people" instead of just total (I see you are working on the last issue). There have been issues for this before.

Does this make sense? In terms of Update needed for Senegal (time, week, total, long/lat) should I write new issues for this?

If those issues are not fixed you should reopen them instead of creating new ones.

The week number issue is still open #1076 and the total people issue is being fixed #1122

okay, I see date/time is also in 1076- and has not been fixed before in Somaliland:) That was date as text.

OK, so I'll take out date2, and include "message" and "errors".
It makes sense to only store one gender and the total, e.g. "Males under 5" and "Total under 5", as "Females under 5" is Total-Males. The backend can handle that logic for a request for "Females under 5" and so on.

I think having a CaseReportID is a very good idea, and it would be nice to actually receive as part of the data rather than generating them when storing in the database, because of duplicate discovery as you say, Cathinka.

For the date/time-issue: We are currently storing many kinds of timestamps, such as Date, Time, ISOWeek and so on. Perhaps it would be better to store one timestamp only (e.g. in Unix time) to avoid risk of discrepancy between the different time indicators? And use the backend to generate the relevant data for different time-related queries from this single timestamp?

Yes, great, so add messsages, errors and casereportID. Take out Date2 as time to have. just dont physically take it out of the data set now:)

For time/date, I dont know how this technically work best. But using Unix, does it mean we will see the time in seconds only? or is this in the backend? I am thinking the exact timestamp (h/m/s)should be visible and all other times based on this. ISO week is also not the same as normal week number.