pfmc-assessments/nwfscSurvey

many lengths from AFSC surveys are not yet available

Closed this issue · 7 comments

Beth reports that length frequencies were sampled by the AFSC surveys in a separate way than the length/weight/age sampling for individual specimens and are stored in a different table which is not yet accessible in the data warehouse. Thus, for species like skates that didn't have otolith collections, there appear to be 0 length samples from the Triennial survey. For species which did have otolith sampling, the lengths available from the data warehouse are likely only those associated with ages, not the full set of length samples.

This is something that needs to be sorted out by the data team, but I'm posting the issue here to help @cwetzel and users of this package understand why they might be not seeing all the lengths that they expect.

Todd Hay writes:

So the Triennial data is actual stored in a separate table in our data warehouse. You can retrieve it using a URL such as the following for 2001 data with 2 species:

https://www.nwfsc.noaa.gov/data/api/v1/source/trawl.triennial_length_fact/selection.csv?year=2001&scientific_name=Sebastes pinniger,Microstomus pacificus

you can add in multiple years via a comma separated list of years, or all of the years by dropping the year= filter entirely

Holler with questions.

Thanks for the info on those previously identified issues. This is a new issue that came up yesterday when PullBio.fn failed to provide any triennial data for longnose skate or big skate. PullCatch.fn worked fine for those species. Initially Beth thought the skate lengths are not currently accessible in the data warehouse and this would have have to wait until Brandon returned from leave. However, Todd later wrote the email pasted into my comment above, showing that it was possible to get the data with a differently formatted URL. The URL https://www.nwfsc.noaa.gov/data/api/v1/source/trawl.triennial_length_fact/selection.csv?year=2001&scientific_name=Raja%20rhina based on the link from Todd in previous comment got me 839 Longnose Skate lengths from 2001 that PullBio.fn did not get.

Beth said one reason these two sets of lengths have been kept separate is that she has not been able to get confirmation on whether there is duplication between the specimen file which is currently accessible via PullBio.fn and the other set of length data which is not currently accessible. I just did an experiment with POP, which are present in both tables. I compared the result of the command

PullBio.fn(SciName = "Sebastes alutus", SurveyName = "Triennial")

to the CSV I got using the URL: https://www.nwfsc.noaa.gov/data/api/v1/source/trawl.triennial_length_fact/selection.csv?year=2001&scientific_name=Sebastes%20alutus

I then subset to look at fish caught on an arbitrary date, 2001-Aug-24 and found that the two sets of samples look like they are drawn from a similar distribution, but are definitely not identical, with each contains samples not found in the other (12cm fish only in first set, and 51cm fish only in the 2nd):

pop_hist

This leads me to believe that the lengths from the Triennial survey available in the part of the data warehouse accessed by PullBio.fn are probably incomplete for all species.

Further conversation with the data team is needed to understand whether both sets of data could be combined in the future, perhaps with an additional column noting which source they come from. I would wait for the conclusion of that conversation before considering modifying the PullBio.fn to access this alternative set of lengths.

The missing lengths in the data warehouse have been sorted out by Beth, but the PullBio.fn function needs to be revised to access the separate Triennial length table in addition to the table with ages (and lengths of the aged fish).

It looks like changing "trawl.individual_fact" to "trawl.triennial_length_fact" in UrlText will point the code to the lengths table, but it sounds like the year range needs to be specified differently (as noted above at #9 (comment)). The columns that get returned also seem to differ from the specimen table so additional code will be needed to deal with that.

Lastly, if lengths and ages are in separate tables, it probably makes sense to return them both as elements of a list with names like $Ages and $Lengths, rather than than make the user choose which one they want, which could lead to folks overlooking the separate length table and just using the lengths of the aged fish.

I'm happy to help with this if you want, @cwetzel.

The triennial biological data are now available in the data warehouse. However, the AFSC slope survey is not available yet but will be in the data warehouse by the end of 2018.