datamade/just-spaces

Pick a method for handling PLDP required fields

Closed this issue · 6 comments

According to the PLDP, each survey needs a Total value attached to each row. time_start, time_stop, representation, and method are also required.

I think we want to keep total required, but do we want to keep the other ones or make them optional? Should they have default values?

For observational surveys, it should not be possible to submit a survey without a value for Total. For intercept, that value should default to 1. (Think this though, and maybe ask UCD: is there any reason it wouldn't always be 1?)

I've created a Total form element in #87 that is required once it's added, but it still has to be added manually and can be deleted once it's there. Change that!

I've looked a little more into this Total idea from the PLDP, and I don't think the understanding Eric, Regina, and I had been working under is right! We'd understood it to mean that surveys need an additional, separate total of all people under consideration. I looked back at the PLDP report (particularly page 7), and it looks like that row_total field is just the count for each component category. (That is, if you count 5 people age 0-6 that would be the field to put that number.) This is backed up by Gehl's sample data.

If my new understanding is right then this has some implications for work that's already been done, but is also simpler in the long haul.

After combing through Gehl's sample data, I've come to the conclusion that they intend row total to apply to a specific survey question, not to the whole survey as we'd originally understood. However, it seems that some surveys (this one) do use a row_total field.

I propose we keep row_total as an observational metadata field but make it optional, and adjust the data handler and django-pldp model accordingly. @jeancochrane how does that sound to you?

The idea is that we'd have a Total Count survey question that would refer to the total number of people being counted in the survey. What PLDP refers to as row_total would just be the count for an individual bucket, eg if you count 5 people age 0-6 then what Gehl refers to as row_total would be 5

I'm not feeling super confident about this. I think Gehl's PLDP descriptions are unclear! We could always implement it this way but then try to run it by them and UCD and see what they think.

I still don't quite understand -- if row_total just refers to the count in a bucket, then why do we need a separate field for it? It seems like you could derive row_total from the existing data.

In general, I'm also fine with going for an implementation that makes the most sense to you now, and then revisiting this once we're ready to have a longer conversation about PLDP compliance. It seems like it's not high-priority to UCD at this point, anyway.

Hm, I see your point. I'm feeling uncertain enough that I think we should revisit it once we have an implementation we can talk about it through, and the simplest way to do that is to leave off row_total as a separate field until we do.