smnorris/bcfishobs

Increase number of observation data points by incorporating point_type_code = 'Summary'

Closed this issue · 8 comments

I have noticed that in 01_clean-fishobs.sql that o.point_type_code = 'Observation'. This excludes o.point_type_code = 'Summary'.

I am not sure when exactly the submission requirements to fulfill permit conditions changed, but I do know that since 2008, a permit holder need only submit observations as a summary. For this reason, there will be many many thousands of observations that are not included in the current bcfishobs outputs.

The requirement to only summarize fish captured by species and life stage is very unfortunate as there is so much good information lost that way in my opinion. The classifying fish by life stage is likely extremely error-ridden and is very easily done afterwards when each individual fish isinput. Nevertheless, the summary does contain an accurate account of how many fish of each species is captured and is often the only data submitted (example fish_observation_point_id = 471343).

I am obviously not expecting that you are going to jump on this or change anything at this point but it is something that I will explore in the next couple of months as we look towards densities of fish captured in electrofishing sites and fish observations vs. stream/watershed characteristics.

Btw, this package is a thing of beauty. ⭐

This is interesting - my knowledge of FISS data requirements and collection is sketchy at best.

Presuming the matching logic doesn't have to change I think adding the summary points would be simple. I've removed them to eliminate duplication... but duplication doesn't really matter in the fish passage model. If observation data is getting missed this is worth exploring for sure.

This is the current count:

# select point_type_code, count(*)  from whse_fish.fiss_fish_obsrvtn_pnt_sp group by point_type_code;
 point_type_code | count
-----------------+--------
 Observation     | 352766
 Summary         |  67507

Looks up to date to"

dbGetQuery(conn, "SELECT o.observation_date, o.point_type_code FROM whse_fish.fiss_fish_obsrvtn_pnt_sp o;") %>% filter(observation_date > '1900-01-01' & observation_date < '2021-02-01') %>% group_by(point_type_code) %>% summarise(min = min(observation_date, na.rm = T), max = max(observation_date, na.rm = T))

*edit - inserted the wrong output last time. The dataset looks up to date to a couple months ago.
point_type_code min max

1 Observation 1900-01-02 2020-11-13
2 Summary 1901-01-01 2020-11-13

Roping in @CaptainMarmot for any Provincial data wisdom

I wish I understood the FISS stuff more but it is not my area of expertise either - and now that Gord Oliphant is retired, we have a bit of a corporate knowledge gap on this front at the moment.

That said, I just chatted with Robin M. about this issue. I think part of the problem is the use of the word 'Summary' It is used in both the data context as well as the spatial context. The issue that Al raises is Summary Observations in the data table sense - i.e. this is a summary of all the types of fish caught in this location on this day. The UTM associated with that OBS point is where the sampling actually took place. It accurately represents species observed at that spot and as such is adequate for the presence / absence work we have been doing up until now. Al is correct is saying that the way in which the data is submitted and rolled up may not be adequate for looking at the densities of fish captured in electrofishing sites and making the subsequent link to fish observations vs. stream/watershed characteristics.

The o.point_type_code = 'Summary' is something different - it is the use of the word 'Summary' in the spatial sense and refers to the points located at the mouth of a stream which summarize all of the different species which have been observed in that watershed. Basically a rollup of all the different species Observation Points in that drainage. The UTM for these points is always at the mouth of a stream or river. These points are duplicates of the information held in the o.point_type_code = 'Observation' records. which is why we have not included them in the model. That said, there may be some very old records from pre GPS/GIS days which did not have an accurate UTM associated with them and which were just shown as summary points at the mouth of the watershed. These are few and far between and the likelihood of one of those points being the only record of a species in a watershed is slim so I don't think we need to worry about them being left out. But based on Simon's comment about duplicates not being a big deal in this application, I guess we could add them in and see if it changes anything.

I hope that makes sense.

Ok. That is great that the summaries of species submitted are there in the obs under o.point_type_code = Observation. Really, the o.point_type_code = Summary doesn't seem all that useful off the top of my head and it makes sense that Simon has dropped it in bcfishobs when tieing the obs to a specific place on the stream.

I think (and hope) that the data summary info actually works ok for densities once the electrofishing info is manually pulled out of the database - Robin helped us do a pull a couple months ago (i.e. length/width of site and ef seconds). The summarizing by life stage that happens on the input form just loses some of the resolution of the data but the total numbers are there so we might want to lump all or some of the life stages together for each species to get the densities which isn't really that bad.

Thanks for looking into this!

Ok, I'll close - looking at example summary point fish_observation_point_id = 471343, there is indeed an matching upstream point_type_code='Observation' (276899)