paleobiodb/data_service

discrepancy between returned early and late intervals if calling up occurrences vs taxon names

Opened this issue · 6 comments

meljh commented

I have been using the PaleoDB to compile stratigraphic ranges for taxa and have discovered that it is possible for the early and late intervals returned for a taxon (when downloading taxonomic names) to comprise a different/longer amount of time than implied by the ages of the collections that taxon is listed in (when downloading occurrences). Here is an example:

http://www.paleobiodb.org/data1.2/occs/list.txt?base_name=Acidaspidina%20plana&show=class,time&idqual=certain

will return 4 records, all occurrences assigned to the Maduan, currently with max_ma of 501 and min_ma of 498.5 in the database.

In comparison:
http://www.paleobiodb.org/data1.2/taxa/list.txt?base_name=Acidaspidina%20plana&show=class,parent,app&rel=current

will return a record for the taxon with the expected max_ma (501) and min_ma (498.5) but with early and late intervals as Drumian and Guzhangian, respectively, presumably because the Drumian is 504.5 to 500.5 and Guzhangian is 500.5-497.0, and thus comprise the max and min ages from the occurrences.

But if I wanted to apply an updated/different age model to the returned early and late intervals, this would result in a longer stratigraphic range (essentially less precise) for this taxon than is known from the occurrences. In this case, the range would also be inaccurate as the Maduan is currently within the Paibian, so this taxon is actually younger than the Guzhangian (the age assignments in the PBDB for this regional stage are out-of-date, not a surprise since this is the Cambrian, but only compounds the problem and would be impossible to correct by someone downloading ranges via taxonomic names).

Yes, you have highlighted an important point, which perhaps I should better explain in the documentation. The first and last occurrences reported by the taxa/list operation are expressed according to the international chronostratigraphic timescale, rather than the time intervals that were originally entered.

This was a deliberate choice, so that this information would be presented for all taxa using a single consistent timescale. In general, if you need exact information about occurrences in the PaleoDB, it is always better to query for them directly as you have done and then analyze the resulting dataset yourself.

meljh commented

I wonder if it would be worth starting a FAQ for things like this that might demand more documentation than the explanatory text currently online for different input/output parameters? This way examples could be included as well. The one above would be something like "When downloading lists of taxon names, why is there sometimes a discrepancy between the early and late intervals returned for a taxon and the absolute ages returned for the same taxon?"

Why is the treatment of first/last appearance times for taxa treated different than occurrences?

@mmcclenn:

The reason for this is that the taxon record records whatever is entered as the first/last appearance by the person who entered it, presumably according to the current literature. This is independent of the recorded occurrences of the taxon in the database.

🤔

but wait, you said

The first and last occurrences reported by the taxa/list operation are expressed according to the international chronostratigraphic timescale, rather than the time intervals that were originally entered

So, let me see if I understand:
a) Occurrences are dated according to the interval listed on the occurrence, as entered by an enterer, possibly later revised, etc. These original intervals are reported by the API. The dates for those intervals are those interval dates according to the current time-scale used by the PBDB.

b) When we call a taxon, it uses the first and last intervals as listed on that taxon, usually those entered by the person who entered that taxon. (Without reference to updated collections/occurrence data??) Or are the dates themselves taken from what original enterer's have added in?

c) Furthermore, those ages are then assigned to... other intervals, as in @meljh's example? And that's so all the intervals returned for first/last interval by taxa/list are on the international scale (presumably the Maduan isn't part of the international scale - I don't know that, I don't work in the Cambrian...). So the PBDB tries to return intervals that best comprise the dates listed for the intervals originally listed for that taxon.

Am I missing something? Or did you mean to say the age/interval information for collections/occurrences is as most recently entered, and so the data reported for occurrences/collections is closer to the data-as-is?

@mmcclenn:

The reason for this is that the taxon record records whatever is entered as the first/last appearance by the person who entered it, presumably according to the current literature. This is independent of the recorded occurrences of the taxon in the database.

🤔

but wait, you said

The first and last occurrences reported by the taxa/list operation are expressed according to the international chronostratigraphic timescale, rather than the time intervals that were originally entered

So, let me see if I understand:

  1. Occurrences are dated according to the interval listed on the occurrence, as entered by an enterer, possibly later revised, etc. These original intervals are reported by the API. The dates for those intervals are those interval dates according to the current time-scale used by the PBDB.

  2. When we call a taxon, it uses the first and last intervals as listed on that taxon, usually those entered by the person who entered that taxon. (Without reference to updated collections/occurrence data??) Or are the dates themselves taken from what original enterer's have added in?

  3. Furthermore, those ages are then assigned to... other intervals, as in @meljh's example? And that's so all the intervals returned for first/last interval by taxa/list are on the international scale (presumably the Maduan isn't part of the international scale - I don't know that, I don't work in the Cambrian...). So the PBDB tries to return intervals that best comprise the dates listed for the intervals originally listed for that taxon.

Am I missing something? Or did you mean to say the age/interval information for collections/occurrences is as most recently entered, and so the data reported for occurrences/collections is closer to the data-as-is?