Add testdata for Analytics based on Tines new Excel
Closed this issue ยท 10 comments
See attached file.
The line-data from this excel-file would be perfect for the Reporting BC too. The test-data in Reporting right now is very limited. Should we create a separate issue for that @cathinkaw or can this issue include testdata for both BCs?
Very good point, @esp0 ! I think that should be a separate issue as they are separate bounded context. But it can of course links to this and if someone takes it after this one is done they can maybe be inspired by what has been done here? Can you create it?
@agnetedjupvik i just got a new version of the testdata file from @tineml. I updated the one in the issue description ๐
Question: The data format is changed in the new test data (see the old format here. Should I change it so that the new format applies everywhere? There aren't that many changes, and the new format mostly expands on the old one.
Alternatively, we can keep them both for now.
This is a quick overview of their two:
Old format | New format |
---|---|
CaseReportId | |
DataCollectorId | DataCollector |
HealthRisk | HealthRisk |
Origin | |
Message | |
NumberOfMalesUnder5 | MalesUnder5 |
NumberOfMalesAged5AndOlder | MalesOver5 |
NumberOfFemalesUnder5 | Under5 |
Total | |
Male | |
Female | |
Longitude | LatLong |
Latitude | |
Timestamp | Time |
Date | |
Date2 | |
Time | |
ISOWeek | |
Status | |
Region | |
District | |
village |
Good point!!! There have probably been some versions of testdata and also of the platform.
The test data is based on export from Somaliland version of the platform with additional columns we wanted, and export should have these columns but without "date2" (this column is there as we needed to make a formula for changing date format- so date2 column is the right date format and we should have only one column for date). It should also have last two additional columns for "message" and "errors". Also seeing now that "total" column is before male and female in the testdata. Total should be the last column. Sorry! seeing I focused on getting content in for a dataset and didnt look properly over titles. They have been updated by Espen after feedback from Mozambique. Columns below have these new titles which are the correct ones:
Males under 5 | Males 5 or older | Females under 5 | Females 5 or older | Total under 5 | Total 5 or older | Total females | Total males
long/lat is because export now put it in one column. When registering volunteers this info goes in to different fields, so dont know why it ends up in one now. if we want to use this for manual analysis from excel, it is better if it is in two columns.
Time can be time
For casereportID, we don't have this in the exports now, but this is good if it makes us see duplicates..?
Also realising as I am exporting from Senegal testing now, that date and time is in the same column in senegal. This has been an issue previously and was solved for Somaliland version. Senegal also lack week number (closed as issue 1108 as I though issue 1084 had solved it. Espen might have done it?) and it says "total people" instead of just total (I see you are working on the last issue). There have been issues for this before.
Does this make sense? In terms of Update needed for Senegal (time, week, total, long/lat) should I write new issues for this?
okay, I see date/time is also in 1076- and has not been fixed before in Somaliland:) That was date as text.
OK, so I'll take out date2, and include "message" and "errors".
It makes sense to only store one gender and the total, e.g. "Males under 5" and "Total under 5", as "Females under 5" is Total-Males. The backend can handle that logic for a request for "Females under 5" and so on.
I think having a CaseReportID is a very good idea, and it would be nice to actually receive as part of the data rather than generating them when storing in the database, because of duplicate discovery as you say, Cathinka.
For the date/time-issue: We are currently storing many kinds of timestamps, such as Date, Time, ISOWeek and so on. Perhaps it would be better to store one timestamp only (e.g. in Unix time) to avoid risk of discrepancy between the different time indicators? And use the backend to generate the relevant data for different time-related queries from this single timestamp?
Yes, great, so add messsages, errors and casereportID. Take out Date2 as time to have. just dont physically take it out of the data set now:)
For time/date, I dont know how this technically work best. But using Unix, does it mean we will see the time in seconds only? or is this in the backend? I am thinking the exact timestamp (h/m/s)should be visible and all other times based on this. ISO week is also not the same as normal week number.