ArctosDB/arctos

New Identifier Type: Arctos record GUID

Closed this issue ยท 96 comments

Current Status

The core of this is running in test, feedback is welcome.

  • It should not be possible to use an Arctos record GUID for an inappropriate identifier type
  • It should not be possible to use a not-actual-GUID for this type
  • It should be possible to input only a triplet and end with a fully correct entry (corrections noted in remarks)
  • All entries should magic to an appropriate issuedby agent regardless of input (corrections noted in remarks)
  • Relationship searches are now using this type
  • Bulkloader check is producing targeted errors
  • Bulkloader 'pull' is greatly simplified and using this type
  • Enter and Edit identifier display is sized to accommodate Arctos GUIDs (and adjustments to post-entry edit forms will come after #6687)

Definition

Arctos record identifiers or GUIDs when used as identifiers, primarily for the purposes of forming relationships. Only Arctos record identifiers may be used here; Arctos record identifiers may not be used in other identifier types, except Arctos:Entity when used as Organism ID. Automation will correct issued by agent, and will attempt to guess (and leave remarks) if "Triplet" is provided. Value should be added to prefix when available.

  • added "Value should be added to prefix when available." per conversation with @mkoo @Jegelewicz

In Limbo

Can we eliminate a huge trap, #7808 (comment) #7836?


Original Issue

Problem: Need to distinguish and standardize Arctos GUIDs/Urls as distinct "identifier" type

Describe what you're trying to accomplish
Make it easier to identify and link to arctos urls in a standardized and internally controlled way

Describe the solution you'd like
New ID type: "Arctos record identifier"

Arctos record GUID - The full url of the related Arctos catalog record. Must begin with https://arctos.database.museum/guid/ followed by an Arctos record identifier (the triplet).

The special type would facilitate the correctness of internal links by

  1. enforcing values that begin with https://arctos.database.museum/guid/ in bulkloaded data (both the main bulkloader and the identifier bulkload tool)
  2. enforcing the use of the correct issuer based upon the triplet prefix part of the url in the value in bulkloaded data (both the main bulkloader and the identifier bulkload tool)
  3. pre-entering https://arctos.database.museum/guid/ in data entry forms when the type is selected (both data entry and in-record additions)
  4. adding the appropriate issuer based upon the triplet prefix part of the url in the value (both data entry and in-record additions)
  5. Disallowing values that approximate https://arctos.database.museum/guid/ in other types.

Describe alternatives you've considered
increasing chaos

Additional context
Add any other context or screenshots about the feature request here.

Priority
Wildfire

This is acceptable as long as

  1. I can control the value (Arctos GUID) and issuer, and
  2. I can disallow those things in identifiers of other types

I am of course happy to help clean up any existing problems which would prevent implementation.

See also #5310

This is acceptable as long as

  1. I can control the value (Arctos GUID) and issuer
    YES, agree
  2. I can disallow those things in identifiers of other types
    YES, agree

From @mkoo in #6738:


From the AWG discussion:
A new identifier would be created called Arctos record identifier which would expressly be the full URL of the Arctos record.

The data entry form needs to reflect that users would be able to add the catalog record or DwC triple and the domain etc (https://arctos.database.museum/guid/) be appended. Although the builder could do that already.

Other suggested UI tweaks-- the Edit form on the record page:

Firefox_Screenshot_2024-05-23T19-49-20 310Z

  • Change text in red circle to "Prefix or String (see Type definitions)"
  • increase Integer box (I can never see the full value! at least let us see more than 4 digits)

There is also agreement that we would remove the type= "institutional catalog number" and replace with simply "identifier" and the appropriate Issued by for consistent and discoverable other ids.

appended

Yes, I can potentially "I think you mean...." and manipulate the identifier, BUT there's also just about a 100% chance I'll occasionally mess that up. (So perhaps I should throw the 'input' into remarks or something if we get there.) Very strongly suggest we NOT do that, instead embrace #5310 (which leaves no room for confusion, doesn't require me to guess what a user might have intended, and doesn't become a liability at the borders of Arctos).

Prefix

Not a good discussion until #6687 is resolved (prefix may not survive).

remove the type= "institutional catalog number"

For the record: I'm very hesitant about adding more types at all, and my anxiety over introducing yet another type is greatly amplified by the lack of movement on the many existing identifier issues (much of https://github.com/ArctosDB/arctos/issues?q=is%3Aissue+is%3Aopen+identifier+prefix+label%3A%22Priority+-+Wildfire+Potential%22 , but there are still no issues for a bunch of other nonsensical types - eg there are still types for the media/object/device which carries identifiers!!). Clearly much of the confusion leading up to this proposal involves becoming lost in those arbitrary and unnecessary types. Removing what is perhaps the most confusing (and least consistently used) type is a great start, but is there any possible way we can commit to fully normalizing the ecosystem and getting ourselves out of this mudbog as we're adding this?

remove the type= "institutional catalog number"

Can we just stick to this one (very nice) thing and address that elsewhere? I'd hate to see this mired in arguments about other things. Also, I like the idea of type being functional, this could help us as we work through the remaining types.

An addition is the opposite of the simplification this is looking for. I definitely don't want any arguments, but I also think that nearly all of them involve getting lost in the complexity, much of which is brought about by the multitude of unnecessary types. Removing the thing that's clearly confusing users seems in line with the stated goals.

functional

If you mean having rules attached to types and agents, that has always been on offer. (But I think nobody's quite sure what to ask for because of the clutter of so many types, probably complicated by the surprising "what's a GUID?" conversation.) I'd be happy to work up a proposal if anyone's interested, open an issue.

Just a note: most of the usage ( but not all) of institutional catalog number is happening because we lack the clear alternative requested here. Once we have a clear and functional alternative, we can then move towards replacing and fixing the institutional catalog number ids. I absolutely agree with @Jegelewicz that we should not conflate these two issues.

most of the usage ( but not all) of institutional catalog number is happening because we lack the clear alternative requested here

See #7808 (comment), this cannot exist as long as those things exist, I can't create this except while also moving them.

I will not support adding more muck in which to get lost. This can and should be a simple matter of sorting identifiers in two ways (here for the resolvable, not-here for the rest). There should be no ambiguity in the data, I don't think I need anything but an OK. (But if this again starts looking realistic I can provide data here for review.)

This affects active data entry protocols across multiple collections in my institution. The only way to accomplish this in a short amount of time is to add the new identifier first so that the correct identifiers can be added and shown to be functional, and then communicate the need to change workflows. This can happen quickly if we do it right now - we have a couple of weeks before the summer cataloging push starts up. Collections need to know that existing data will not be lost from older records. This is the "social" part here - which must be included for this to work. We don't want a repeat of last year. As soon as the new "Arctos record ID" format is up and running, @dusty can convert all existing Arctos guid "identifiers" without problem. The remaining "institutional catalog numbers" can then be prioritized for conversion once we are certain that all existing Arctos relationships have been appropriately captured and converted.

So if I understand @dustymc correctly, we can proceed right away with the resolvable identifiers in Arctos - I agree completely.

#7808 (comment) is technically incompatible with what was discussed. The concerns that a new dedicated type might somehow cause data loss are - well, guess I don't have a word, but it's whatever you'd use to describe something that just can't happen. The training and adaptation should be straightforward: use the thing that doesn't produce an error (which hopefully will be self-explanatory once the thing that's obviously be causing arbitrary data is gone).

Now #7808 (comment) is making me think I've misunderstood something again.

I need the OK to

We are in agreement on all above, except the last step, which requires a temporal delay of a week or two as collections need to be notified to change workflows, otherwise we have a lot of extremely upset people trying to do things that suddenly cease to work with no notification.
This includes dealing with records currently in the bulkloader and in bulkload prep.

Regarding what to call this - see #5310

I support calling the Arctos GUID the full URL. This is also what we are defining the GUID as in the Arctos paper per the AWG discussion 5-24-2024, as the url created based on the Arctos "record identifier". @ccicero

Revised wording: "Each cataloged record has an Arctos Globally Unique Identifier (GUID) that is constructed from the record identifier (e.g., https://arctos.database.museum/guid/APSU:Fish:1079)."

last step ... suddenly cease to work with no notification.

That is precisely my point, but the implementation will not/can not work as I believe you're expecting it to.

  • If we do what I'm suggesting, a familiar (but evil) thing will be gone and unavailable for getting lost behind, a friendly new thing having appeared in its place.
  • If we do what you're suggesting, a familiar thing will throw new errors when someone attempts to use it - not so familiar after all, eh? - and any data which does get added to it (possible only after having been transformed into a non-useful format) will magically be elsewhere in "a week or two", dragging this seemingly unending process out yet more. Don't sound fun for nobody.

Implementing this in the only way it can be done will be a change in workflow, whether we drag some ancillary bits out or not. That is what was agreed to in the meeting and in #7808 (comment). Surely the folks entering data aren't THAT difficult to talk to, and we do have a communications team who I'm sure would be willing to help.

Can I request a csv of the existing data in Arctos that use "institutional catalog number"? I don't want to hold this up, but I don't want to be responsible for data loss, and I don't want to presume the rest of the community agrees to conversion of existing data and new workflows without notice.

See #5310 (comment) re Arctos GUID vs record identifier.

The special type would facilitate the correctness of internal links by

  1. enforcing values that begin with https://arctos.database.museum/guid/ in bulkloaded data (both the main bulkloader and the identifier bulkload tool)
  2. enforcing the use of the correct issuer based upon the GUID prefix part of the url in the value in bulkloaded data (both the main bulkloader and the identifier bulkload tool)
  3. pre-entering https://arctos.database.museum/guid/ in data entry forms when the type is selected (both data entry and in-record additions)
  4. adding the appropriate issuer based upon the GUID prefix part of the url in the value (both data entry and in-record additions)

All possible?

See first of #7808 (comment) re: (3); I'm hesitantly willing to try, but I do suck at reading minds through malformed identifiers and will occasionally (at best!) mangle that. Defensible procedures would involve not making me guess, even if that is implemented. Everything else: Yup, no problem, that's what I said in #7808 (comment).

Missing is (5), which is critical to this: Disallowing values that approximate https://arctos.database.museum/guid/ in other types.

Yes,I agree with 5 as well

mkoo commented

Those 5 conditions are essential!

EDIT

new data: https://docs.google.com/spreadsheets/d/1bCG8gFuTO5QC7JunnOBhay4ZCDkcaw81Qd-OwgihGGQ/edit#gid=1169145992


Original

If this is to proceed, the first decision will be what we do with the ~15K current identifiers that look like, but are not, valid Arctos GUIDs.

temp_rec_id_not_valid.csv.zip

Excluding 'self' relationships from this would exclude most of these, but that seems like a potential trap of some sort.

There might be reasons to allow non-current GUIDs, but then I would lose any ability to exclude random things that people type, and that seems critical to this (especially having now seen the data!).

Much of this is ALMNH changing GUID Prefix (ACK!!), perhaps those could be stripped to triplets without any real loss of persistence.

I'm not sure what to do from here, but I am sure that this type cannot be just another trashcan.

This feels like it's probably going to need some sort of ad-hoc committee, @campmlc perhaps you'd organize something?

Looking over the file, about 10K are ALMNH, another 4K+ are CHAS, and the remaining 1K are miscellaneous collections.
I would like to request that we create the new ID type with all the needed constraints so that we can use this for incoming accessions that are already coming in for the summer, and then work to deal with these oddities.
Non, ALMNH, non-CHAS:

<style> </style>
   
Row Labels Count of GUID_PREFIX
BYU:Herp 1
DGR:Mamm 1
DMNS:Inv 3
KNWR:Env 4
MMNH:Edu 2
MSB:Bird 10
MSB:Fish 70
MSB:Herp 2
MSB:Host 26
MSB:Mamm 185
MSB:Para 217
MVZ:Bird 3
MVZ:Egg 11
MVZ:Herp 4
MVZ:Mamm 83
MVZObs:Herp 1
NHSM:Arc 2
NMMNH:Paleo 2
NMU:Mamm 14
OWU:Fish 4
OWU:Inv 1
UAM:Art 4
UAM:Bird 38
UAM:EH 15
UAM:Ento 141
UAM:Herb 2
UAM:Inv 2
UAM:Mamm 133
UCM:Bird 2
UCM:Herp 1
UCM:Mamm 20
UMZM:Bird 2
UTEP:ES 1
UTEP:Herb 1
UTEP:Herp 2
UWBM:Herp 2
UWYMV:Egg 4
UWYMV:Mamm 2
Grand Total 1018

The first decision will be what we do with the ~15K current identifiers that look like, but are not, valid Arctos GUIDs.

I suggest that these all have http://arctos.database.museum/guid/ removed from the value and a remark added "previously recorded as x" where x is the current value.

Those ALMNH ones should already have redirects, so nothing is lost there?

Others are linking to valid records that are not yet cataloged in Arctos - e.g. http://arctos.database.museum/guid/HWML:Para:74826 which is a parasite of the linked MSB Mamm record, cited in a publication: Elisa Pucu, Marcela Lareschi, Scott L. Gardner. 2014. Bolivian Ectoparasites: A Survey of the Fleas of Ctenomys (Rodentia: Ctenomyidae). Comparative Parasitology 81(1):114-118.. It is assigned a catalog number at HWML, but not yet cataloged there in Arctos.
This is more problematic, because one collection should not have to hold back on capturing relationships just because the related collection is slower on cataloging.

For identifiers that don't resolve for the above reason but otherwise meet all the criteria for an Arctos guid, can we just leave them as is and get periodic reports for "unresolved Arctos relationships" to sort out what is wrong? That would find http://arctos.database.museum/guid/HWML:Para:74826 and also similar situations where someone entered the related catalog number incorrectly, e.g. 74426.
It would also automatically resolve once the record was entered, with no action needed by the originating collection.

create the new ID type with all the needed constraints ... and then

That is not technically viable. I don't know how to communicate that more clearly than in #7808 (comment).

these all have http://arctos.database.museum/guid/ removed from the value and a remark added "previously recorded as x"

That seems reasonable to me, agree nothing would be lost (and of course I'll leave CSV at every step if this becomes actionable).

not yet cataloged

There are about a million ways for that to go wrong, and about a million easy ways to avoid the situation. This type cannot become yet another garbage can. I propose we don't preemptively kill it on the easily-avoidable fringe use cases.

If we can implement even one easy way of the supposed million to deal with timing issues related to different collections cataloging related objects in Arctos at different times, that would be great. I do want to hold someone to that promise.
Otherwise, if this is the only way we can move forward, let's do it.

I don't suppose there is any way this could be implemented first in test?

I just fixed all but one of the UTEP problems - the last one appears to be "the related thing isn't cataloged". I can see how this means that relationships end up never being made (I know about it on my end, but the other collection isn't finished cataloging, so I can't record the relationship, and they don't because they don't know, then it never gets recorded). So I see the value in checking if the related item exists, but I also see the value in recording one side of a relationship, with the other side coming later. If we can't do that, why have the bot?

Checking whether the format of the identifier is correct is great! Ensuring that the link works at the time of entry maybe not so much because it means MSB mammals cannot record their parasite links until MSB Para has cataloged their records. Requiring both records to exist before a relationship can feels like a trap that means the relationships NEVER get recorded.

implemented first in test?

Yes.

why have the bot?

To make valid reciprocal relationships! If we allow that thing that'll totally happen tomorrow then we also allow http://arctos.database.museum/guid/UAM:Bird:unknown to continue to exist and this is just another garbage can.

Make a relationship using whatever identifiers are available (generate a UUID if there's not something 'native' handy, that's why the type exists - and of course file an issue if that's at all complicated, it should not be), then file an issue for help in upgrading it once things are cataloged. "The bot" is but one of a potential swarm, this still looks like an easily avoidable (and mostly theoretical) situation, albeit one that's definitely capable of killing this idea.

Another issue is that some of these that have re-directs should not be messed up, or the redirect will not function. See for example:
http://arctos.database.museum/guid/MSB:Mamm:270088 which redirects to https://arctos.database.museum/guid/MSB:Mamm:274455

re-directs should not be messed up,

I do not know what that means. https://arctos.database.museum/guid/MSB:Mamm:270088 just 404s because it doesn't exist

arctosprod@arctos>> select count(*) from flat where guid='MSB:Mamm:270088';
 count 
-------
     0

and there's nothing in redirects to suggest it should do anything else.

arctosprod@arctos>> select * from redirect where old_path ilike '%MSB:Mamm:270088%' or new_path  ilike '%MSB:Mamm:270088%';
 redirect_id | old_path | new_path 
-------------+----------+----------
(0 rows)


Perhaps a topic better addressed in another issue?

consensus on 5-24-24 call: new ID type to be called "Arctos record GUID"

Plan to demo on June 6

Adjust things in the bulkloader to match this as possible.

Working definition:

Arctos record identifiers or GUIDs when used as identifiers, primarily for the purposes of forming relationships. Only Arctos record identifiers may be used here; Arctos record identifiers may not be used in other identifier types, except Arctos:Entity when used as Organism ID. Automation will correct issued by agent, and will attempt to guess (and leave remarks) if "Triplet" is provided.

and proposed update to https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#identifier:

This type is proper for a wide range of identifiers that can be disambiguated by the agent that issued them. This identifier type is not indicative of low-quality data but allows for ease of identifier searches across specific uses for specific purposes. NOTE: Use "Arctos record GUID" for local record identifiers/relationships.

Moved to #7836

Csv please?

You can download CSV from the sheet linked in the comment directly above yours.

-moved to top-

Any way to catch this and disallow it?

image

Should we? It does leave the door open for people to select the wrong type and make bad (or no) links, but it also seems like it would be a lot of computing to check for it.

Any way to catch this and disallow it?

No, that's kinda the point of #5310. People who work with Arctos are probably going to make some assumptions about that, we've been reinforcing those assumptions by pretending the universe stops at the data we can control, but there's also no way to tell if that's a perfectly valid (local) identifier that didn't originate in Arctos. (That might be a bad example, but there are in fact at least three "UAM:Mamm"s out there.)

It's not a CPU thing, it's a "how good identifiers work" thing.

This type won't obviate any need for reading documentation. It will make it easy to get the details right once/hard to get them wrong you've gotten close, but it will absolutely not prevent someone from making huge messes.

(Moving on #7808 (comment) would prevent a very common flavor of those messes so - please, anyone?)

Data entry let me do this

image

which I'm guessing will fail when I load the record?

guessing

Screenshot 2024-05-30 at 07 25 36

EDIT: See #7837 for this conversation, I can provide data there after this issue is implemented.


Going the other way, the collection agents are issuing all sorts of nonsense.


   c   |                                              issuedby                                              |                      collectionid                       | guid_prefix |              other_id_type               
-------+----------------------------------------------------------------------------------------------------+---------------------------------------------------------+-------------+------------------------------------------
     6 | Alabama Museum of Natural History Bird Collection                                                  | https://arctos.database.museum/collection/ALMNH:Bird    | ALMNH:Bird  | identifier
    22 | Alabama Museum of Natural History Geology Collection                                               | https://arctos.database.museum/collection/ALMNH:Geo     | ALMNH:Geo   | institutional catalog number
     1 | Alabama Museum of Natural History Mammal Collection                                                | https://arctos.database.museum/collection/ALMNH:Mamm    | ALMNH:Mamm  | identifier
     1 | Brigham Young University Life Science Museum Amphibian and Reptile Collection                      | https://arctos.database.museum/collection/BYU:Herp      | BYU:Herp    | identifier
   575 | Chicago Academy of Sciences Bird Collection                                                        | https://arctos.database.museum/collection/CHAS:Bird     | CHAS:Bird   | identifier
    61 | Chicago Academy of Sciences Bird Eggs Collection                                                   | https://arctos.database.museum/collection/CHAS:Egg      | CHAS:Egg    | identifier
     1 | Chicago Academy of Sciences Ethnology and History Artifacts Collection                             | https://arctos.database.museum/collection/CHAS:EH       | CHAS:EH     | identifier
     5 | Chicago Academy of Sciences Mollusc Collection                                                     | https://arctos.database.museum/collection/CHAS:Inv      | CHAS:EH     | identifier
    29 | Chicago Academy of Sciences Insect Collection                                                      | https://arctos.database.museum/collection/CHAS:Ento     | CHAS:Ento   | identifier
  1908 | Chicago Academy of Sciences Mollusc Collection                                                     | https://arctos.database.museum/collection/CHAS:Inv      | CHAS:Ento   | identifier
    10 | Chicago Academy of Sciences Teaching Collection                                                    | https://arctos.database.museum/collection/CHAS:Teach    | CHAS:Ento   | identifier
     2 | California Desert Studies Center Herbarium                                                         | https://arctos.database.museum/collection/CDSC:Herb     | CHAS:Herb   | identifier
   778 | Chicago Academy of Sciences Herbarium                                                              | https://arctos.database.museum/collection/CHAS:Herb     | CHAS:Herb   | identifier
    29 | Chicago Academy of Sciences Herbarium                                                              | https://arctos.database.museum/collection/CHAS:Herb     | CHAS:Herb   | institutional catalog number
    10 | Chicago Academy of Sciences Amphibian and Reptile Collection                                       | https://arctos.database.museum/collection/CHAS:Herp     | CHAS:Herp   | identifier
    82 | Chicago Academy of Sciences Mollusc Collection                                                     | https://arctos.database.museum/collection/CHAS:Inv      | CHAS:Inv    | identifier
     2 | Chicago Academy of Sciences Mammal Collection                                                      | https://arctos.database.museum/collection/CHAS:Mamm     | CHAS:Mamm   | identifier
    12 | Chicago Academy of Sciences Herbarium                                                              | https://arctos.database.museum/collection/CHAS:Herb     | CHAS:Teach  | identifier
     2 | Chicago Academy of Sciences Mollusc Collection                                                     | https://arctos.database.museum/collection/CHAS:Inv      | CHAS:Teach  | identifier
     1 | Museum of Southwestern Biology, Division of Mammals                                                | https://arctos.database.museum/collection/MSB:Mamm      | DGR:Mamm    | identifier
     2 | Denver Museum of Nature and Science Parasite Collection                                            | https://arctos.database.museum/collection/DMNS:Para     | DMNS:Bird   | institutional catalog number
    10 | Denver Museum of Nature and Science Marine Invertebrate Collection                                 | https://arctos.database.museum/collection/DMNS:Inv      | DMNS:Inv    | identifier
    37 | Denver Museum of Nature and Science Mammal Collection                                              | https://arctos.database.museum/collection/DMNS:Mamm     | DMNS:Mamm   | DZTM: Denver Zoology Tissue Mammal
     1 | Denver Museum of Nature and Science Mammal Collection                                              | https://arctos.database.museum/collection/DMNS:Mamm     | DMNS:Mamm   | institutional catalog number
     1 | Denver Museum of Nature and Science Parasite Collection                                            | https://arctos.database.museum/collection/DMNS:Para     | DMNS:Mamm   | institutional catalog number
     1 | Museum of Southwestern Biology, Division of Mammals                                                | https://arctos.database.museum/collection/MSB:Mamm      | DMNS:Mamm   | institutional catalog number
     8 | Denver Museum of Nature and Science Mammal Collection                                              | https://arctos.database.museum/collection/DMNS:Mamm     | DMNS:Para   | identifier
     4 | Denver Museum of Nature and Science Parasite Collection                                            | https://arctos.database.museum/collection/DMNS:Para     | DMNS:Para   | identifier
     4 | Kenai National Wildlife Refuge, Alaska Insect Collection                                           | https://arctos.database.museum/collection/KNWR:Ento     | KNWR:Env    | identifier
   635 | Kansas State University Biorepository Mammal Collection                                            | https://arctos.database.museum/collection/KSB:Mamm      | KSB:Mamm    | identifier
     1 | Museum of Southwestern Biology, Division of Mammals                                                | https://arctos.database.museum/collection/MSB:Mamm      | KSB:Mamm    | identifier
  1184 | Kansas State University Biorepository Teaching Collection                                          | https://arctos.database.museum/collection/KSB:Teach     | KSB:Teach   | identifier
   110 | Bell Museum of Natural History Bird Collection                                                     | https://arctos.database.museum/collection/MMNH:Bird     | MMNH:Bird   | preparator number
   469 | Bell Museum of Natural History Education Collection                                                | https://arctos.database.museum/collection/MMNH:Edu      | MMNH:Mamm   | institutional catalog number
     1 | Bell Museum of Natural History Mammal Collection                                                   | https://arctos.database.museum/collection/MMNH:Mamm     | MMNH:Mamm   | institutional catalog number
     6 | Museum of Southwestern Biology, Divison of Birds                                                   | https://arctos.database.museum/collection/MSB:Bird      | MSB:Bird    | identifier
     3 | Museum of Southwestern Biology, Division of Genomic Resources                                      | https://arctos.database.museum/collection/MSB:DGR       | MSB:Bird    | identifier
     4 | Museum of Southwestern Biology, Division of Genomic Resources                                      | https://arctos.database.museum/collection/MSB:DGR       | MSB:Bird    | NK
     1 | University of Alaska Museum Bird Collection                                                        | https://arctos.database.museum/collection/UAM:Bird      | MSB:Bird    | identifier
    68 | Museum of Southwestern Biology, Division of Genomic Resources                                      | https://arctos.database.museum/collection/MSB:DGR       | MSB:Fish    | identifier
     2 | Museum of Southwestern Biology, Division of Fishes                                                 | https://arctos.database.museum/collection/MSB:Fish      | MSB:Fish    | identifier
     2 | Museum of Southwestern Biology, Division of Amphibians and Reptiles                                | https://arctos.database.museum/collection/MSB:Herp      | MSB:Herp    | identifier
     3 | Museum of Southwestern Biology Host (of parasite) Collection                                       | https://arctos.database.museum/collection/MSB:Host      | MSB:Host    | identifier
     4 | Museum of Southwestern Biology, Division of Mammals                                                | https://arctos.database.museum/collection/MSB:Mamm      | MSB:Host    | identifier
    19 | Museum of Southwestern Biology, Division of Parasites                                              | https://arctos.database.museum/collection/MSB:Para      | MSB:Host    | identifier
    37 | Museum of Southwestern Biology, Division of Parasites                                              | https://arctos.database.museum/collection/MSB:Para      | MSB:Host    | institutional catalog number
     1 | U. S. National Parasite Collection                                                                 | https://arctos.database.museum/collection/USNPC:Para    | MSB:Host    | identifier
    24 | Arctos Entity Collection                                                                           | https://arctos.database.museum/collection/Arctos:Entity | MSB:Mamm    | Organism ID
    69 | Harold W. Manter Laboratory of Parasitology                                                        | https://arctos.database.museum/collection/HWML:Para     | MSB:Mamm    | identifier
     2 | Harold W. Manter Laboratory of Parasitology                                                        | https://arctos.database.museum/collection/HWML:Para     | MSB:Mamm    | institutional catalog number
     1 | Museum of Southwestern Biology, Division of Genomic Resources                                      | https://arctos.database.museum/collection/MSB:DGR       | MSB:Mamm    | DGR: Division of Genomic Resources (MSB)
    60 | Museum of Southwestern Biology, Division of Genomic Resources                                      | https://arctos.database.museum/collection/MSB:DGR       | MSB:Mamm    | identifier
    48 | Museum of Southwestern Biology, Division of Mammals                                                | https://arctos.database.museum/collection/MSB:Mamm      | MSB:Mamm    | identifier
     2 | Museum of Southwestern Biology, Division of Mammals                                                | https://arctos.database.museum/collection/MSB:Mamm      | MSB:Mamm    | institutional catalog number
     1 | Museum of Southwestern Biology Mammal Observations Collection                                      | https://arctos.database.museum/collection/MSBObs:Mamm   | MSB:Mamm    | identifier
     7 | Museum of Southwestern Biology, Division of Parasites                                              | https://arctos.database.museum/collection/MSB:Para      | MSB:Mamm    | identifier
     6 | Museum of Southwestern Biology, Divison of Birds                                                   | https://arctos.database.museum/collection/MSB:Bird      | MSB:Para    | institutional catalog number
     2 | Museum of Southwestern Biology, Division of Genomic Resources                                      | https://arctos.database.museum/collection/MSB:DGR       | MSB:Para    | identifier
     3 | Museum of Southwestern Biology, Division of Fishes                                                 | https://arctos.database.museum/collection/MSB:Fish      | MSB:Para    | identifier
    33 | Museum of Southwestern Biology, Division of Amphibians and Reptiles                                | https://arctos.database.museum/collection/MSB:Herp      | MSB:Para    | institutional catalog number
    24 | Museum of Southwestern Biology Host (of parasite) Collection                                       | https://arctos.database.museum/collection/MSB:Host      | MSB:Para    | identifier
   134 | Museum of Southwestern Biology Host (of parasite) Collection                                       | https://arctos.database.museum/collection/MSB:Host      | MSB:Para    | institutional catalog number
  2163 | Museum of Southwestern Biology, Division of Mammals                                                | https://arctos.database.museum/collection/MSB:Mamm      | MSB:Para    | institutional catalog number
    31 | Museum of Southwestern Biology, Division of Parasites                                              | https://arctos.database.museum/collection/MSB:Para      | MSB:Para    | identifier
    37 | Museum of Southwestern Biology, Division of Parasites                                              | https://arctos.database.museum/collection/MSB:Para      | MSB:Para    | institutional catalog number
   154 | University of Alaska Museum Insect Collection                                                      | https://arctos.database.museum/collection/UAM:Ento      | MSB:Para    | identifier
   850 | University of Alaska Museum Mammal Collection                                                      | https://arctos.database.museum/collection/UAM:Mamm      | MSB:Para    | institutional catalog number
     4 | U. S. National Parasite Collection                                                                 | https://arctos.database.museum/collection/USNPC:Para    | MSB:Para    | identifier
     6 | U. S. National Parasite Collection                                                                 | https://arctos.database.museum/collection/USNPC:Para    | MSB:Para    | institutional catalog number
     1 | Arctos Entity Collection                                                                           | https://arctos.database.museum/collection/Arctos:Entity | MVZ:Bird    | Organism ID
     4 | Harold W. Manter Laboratory of Parasitology                                                        | https://arctos.database.museum/collection/HWML:Para     | MVZ:Herp    | identifier
    80 | Harold W. Manter Laboratory of Parasitology                                                        | https://arctos.database.museum/collection/HWML:Para     | MVZ:Mamm    | identifier
     3 | MVZ Hildebrand Collection                                                                          | https://arctos.database.museum/collection/MVZ:Hild      | MVZ:Mamm    | identifier
     2 | Natural History Society of Maryland Archaeology Collection                                         | https://arctos.database.museum/collection/NHSM:Arc      | NHSM:Arc    | identifier
     2 | New Mexico Museum of Natural History and Science Paleontology Collection                           | https://arctos.database.museum/collection/NMMNH:Paleo   | NMMNH:Paleo | identifier
     6 | Northern Michigan University Mammal Collection                                                     | https://arctos.database.museum/collection/NMU:Mamm      | NMU:Mamm    | identifier
     8 | Northern Michigan University Parasite Collection                                                   | https://arctos.database.museum/collection/NMU:Para      | NMU:Mamm    | identifier
 35734 | Ocean Genome Legacy Genomics Collection                                                            | https://arctos.database.museum/collection/OGL:Genomic   | OGL:Genomic | identifier
 33809 | Ocean Genome Legacy Genomics Collection                                                            | https://arctos.database.museum/collection/OGL:Genomic   | OGL:Genomic | lot number
  1032 | Ocean Genome Legacy Genomics Collection                                                            | https://arctos.database.museum/collection/OGL:Genomic   | OGL:Genomic | processing number
     9 | Museum of Southwestern Biology, Divison of Birds                                                   | https://arctos.database.museum/collection/MSB:Bird      | OWU:Bird    | institutional catalog number
     4 | Ohio Wesleyan University Fish Collection                                                           | https://arctos.database.museum/collection/OWU:Fish      | OWU:Fish    | identifier
     1 | Ohio Wesleyan University Invertebrate Collection                                                   | https://arctos.database.museum/collection/OWU:Inv       | OWU:Inv     | identifier
     1 | Trinity College Dublin Geological Museum Paleontology Collection                                   | https://arctos.database.museum/collection/TCDGM:Paleo   | TCDGM:Paleo | identifier
    38 | University of Alaska Museum Bird Collection                                                        | https://arctos.database.museum/collection/UAM:Bird      | UAM:Bird    | identifier
    14 | University of Alaska Museum Ethnology and History Department                                       | https://arctos.database.museum/collection/UAM:EH        | UAM:EH      | identifier
     1 | University of Alaska Museum Mammal Collection                                                      | https://arctos.database.museum/collection/UAM:Mamm      | UAM:EH      | identifier
     4 | Kenai National Wildlife Refuge, Alaska Insect Collection                                           | https://arctos.database.museum/collection/KNWR:Ento     | UAM:Ento    | identifier
     1 | Kenai National Wildlife Refuge, Alaska Environmental Samples Collection                            | https://arctos.database.museum/collection/KNWR:Env      | UAM:Ento    | identifier
    55 | Kenelm W. Philip Lepidoptera Collection                                                            | https://arctos.database.museum/collection/KWP:Ento      | UAM:Ento    | identifier
     1 | University of Alaska Museum Bird Collection                                                        | https://arctos.database.museum/collection/UAM:Bird      | UAM:Ento    | identifier
    75 | University of Alaska Museum Insect Collection                                                      | https://arctos.database.museum/collection/UAM:Ento      | UAM:Ento    | identifier
     3 | University of Alaska Museum Insect Observations Collection                                         | https://arctos.database.museum/collection/UAMObs:Ento   | UAM:Ento    | identifier
     2 | University of Alaska Museum Mammal Observations Collection                                         | https://arctos.database.museum/collection/UAMObs:Mamm   | UAM:Ento    | identifier
     2 | University of Alaska Museum Herbarium                                                              | https://arctos.database.museum/collection/UAM:Herb      | UAM:Herb    | identifier
     2 | University of Alaska Museum Insect Collection                                                      | https://arctos.database.museum/collection/UAM:Ento      | UAM:Inv     | identifier
   132 | University of Alaska Museum Insect Collection                                                      | https://arctos.database.museum/collection/UAM:Ento      | UAM:Mamm    | identifier
     1 | University of Alaska Museum Mammal Observations Collection                                         | https://arctos.database.museum/collection/UAMObs:Mamm   | UAM:Mamm    | identifier
     2 | University of Colorado Museum of Natural History Bird Collection                                   | https://arctos.database.museum/collection/UCM:Bird      | UCM:Bird    | identifier
     1 | University of Colorado Museum of Natural History Amphibian and Reptile Collection                  | https://arctos.database.museum/collection/UCM:Herp      | UCM:Herp    | identifier
    19 | University of Colorado Museum of Natural History Mammal Collection                                 | https://arctos.database.museum/collection/UCM:Mamm      | UCM:Mamm    | identifier
     2 | Harold W. Manter Laboratory of Parasitology                                                        | https://arctos.database.museum/collection/HWML:Para     | UMZM:Bird   | identifier
    19 | University of Montana Philip L. Wright Zoological Museum Bird Collection                           | https://arctos.database.museum/collection/UMZM:Bird     | UMZM:Bird   | preparator number
     2 | University of Montana Philip L. Wright Zoological Museum Mammal Collection                         | https://arctos.database.museum/collection/UMZM:Mamm     | UMZM:Bird   | preparator number
     1 | University of Montana Philip L. Wright Zoological Museum Bird Collection                           | https://arctos.database.museum/collection/UMZM:Bird     | UMZM:Egg    | identifier
    26 | University of Montana Philip L. Wright Zoological Museum Bird Collection                           | https://arctos.database.museum/collection/UMZM:Bird     | UMZM:Egg    | institutional catalog number
     1 | University of Montana Philip L. Wright Zoological Museum Mammal Collection                         | https://arctos.database.museum/collection/UMZM:Mamm     | UMZM:Egg    | institutional catalog number
     2 | University of Montana Philip L. Wright Zoological Museum Bird Collection                           | https://arctos.database.museum/collection/UMZM:Bird     | UMZM:Mamm   | preparator number
     1 | University of Montana Philip L. Wright Zoological Museum Mammal Collection                         | https://arctos.database.museum/collection/UMZM:Mamm     | UMZM:Mamm   | preparator number
  3397 | University of New Mexico Geology Collection                                                        | https://arctos.database.museum/collection/UNM:Geol      | UNM:Geol    | identifier
    94 | University of Texas at El Paso Biodiversity Collections Earth Science Collection                   | https://arctos.database.museum/collection/UTEP:ES       | UTEP:ES     | identifier
     1 | University of Texas at El Paso Biodiversity Collections Herbarium                                  | https://arctos.database.museum/collection/UTEP:Herb     | UTEP:Herb   | identifier
     1 | University of Texas at El Paso Biodiversity Collections Amphibian and Reptile Collection           | https://arctos.database.museum/collection/UTEP:Herp     | UTEP:Herp   | identifier
     1 | University of Texas at El Paso Biodiversity Collections Amphibian and Reptile Osteology Collection | https://arctos.database.museum/collection/UTEP:HerpOS   | UTEP:Herp   | identifier
     1 | Burke Museum Amphibian and Reptile Collection                                                      | https://arctos.database.museum/collection/UWBM:Herp     | UWBM:Herp   | identifier
     1 | Burke Museum Mammal Collection                                                                     | https://arctos.database.museum/collection/UWBM:Mamm     | UWBM:Herp   | identifier
    48 | Burke Museum Invertebrate Paleontology Collection                                                  | https://arctos.database.museum/collection/UWBM:IP       | UWBM:VP     | identifier
     1 | Burke Museum Mammal Collection                                                                     | https://arctos.database.museum/collection/UWBM:Mamm     | UWBM:VP     | identifier
     4 | University of Wyoming Museum of Vertebrates Bird Collection                                        | https://arctos.database.museum/collection/UWYMV:Bird    | UWYMV:Egg   | identifier
     2 | University of Wyoming Museum of Vertebrates Amphibian and Reptile Collection                       | https://arctos.database.museum/collection/UWYMV:Herp    | UWYMV:Herp  | institutional catalog number
     2 | University of Wyoming Museum of Vertebrates Mammal Collection                                      | https://arctos.database.museum/collection/UWYMV:Mamm    | UWYMV:Mamm  | identifier
  1424 | Museo de Zoologรญa de la Universidad San Francisco de Quito Amphibian and Reptile Collection        | https://arctos.database.museum/collection/ZSFQ:Herp     | ZSFQ:Herp   | identifier

That should be cleaned up (I don't know how, probably won't be fun) and prevented (either as part of doing something more-formal with the agent/collection link, or I could do it as part of creating collections) if anyone is willing to deal with good and precise data; I'm having doubts at the moment.

That should be cleaned up

I don't understand how to find those or what the actual problems are? Can you give me more information like what are the records these problems are associated with?

We must understand that these collection agents have issued things that are not Arctos record GUIDs and they should be able to be recorded as such. This includes things like old catalog numbers. All of these relationships should be self and the type should NOT be Arctos record GUID, so no problem?

collection agents have issued things that are not Arctos record GUIDs

If that's the case then the data are as good as anyone cares to make them and nothing else is required.

FWIW I don't think that's OK; https://arctos-test.tacc.utexas.edu/agent/21346749 is not capable of preparing (https://arctos-test.tacc.utexas.edu/guid/MMNH:Bird:51088), https://arctos-test.tacc.utexas.edu/agent/21347747 exists to issue https://arctos-test.tacc.utexas.edu/info/ctDocumentation.cfm?table=ctcoll_other_id_type#dztm__denver_zoology_tissue_mammal and https://arctos-test.tacc.utexas.edu/guid/DMNS:Mamm:14999 is a loss of information (caused by us being inevitable stuck in this weird limbo), etc. I can't see any reason collections should issue anything other than catalog numbers, but definitely not a hill I'm willing to die on if nobody cares. (Does make me wonder why we're here though....)

I can't see any reason collections should issue anything other than catalog numbers

Some of these are catalog numbers that have been replaced with new Arctos record GUIDs.

I'm not sure what "catalog numbers that have been replaced" means but I don't think there's any situation that targeted agents can't handle with precision.

The same agent has issued two catalog numbers. A collection had a numbering system that wasn't integer. Upon migrating to Arctos, they chose to renumber and use integers. They need to track the old numbers for citation purposes.

The same agent

So don't go there if doing do is inconvenient, agents are super-flexible, deciding to not drag along the weird legacy system is as good a reason as any to fire up a new related agent, this just doesn't have to be a problem: https://arctos-test.tacc.utexas.edu/agent/21352458 and done....

Rather than convincing so many different collections and collections staff to retroactively deal with all these legacy issues, why not do the opposite -make new agents for all Arctos collections that can specificially and only be used to issue Arctos record guids? Or make the Arctos record guid the actual issued by agent? That is something you can do as DBA, vs dealing with the thorny details of these legacy issues that clearly have many valid use cases.

agents for all Arctos collections that can specificially and only be used to issue Arctos record guids

That's all I need to add some predictability. Every collection already has an agent with a collectionID, I don't think that does anything other than what I need for this new type. I can make as many more "sub-collection" agents as are needed, I can move stuff around, I can set up rules, etc., but there are also some things that I can't do alone so I'm here asking for help.

  • Biggest-picture, I don't know if anyone cares. It's not exactly breaking anything at the moment, maybe what looks like low-quality 'pick a random thing to make the blank be not-blank' from here makes sense from there, or that was a mistake that you'd like to prevent, or truly nobody cares about this, or ?????????? - IDK, help!

I guess that's it for now. Tell me this matters and what the intention might have been/goals are/whatever and I can help do whatever it is that ya'll want to do.

legacy issues

These were all done recently by "us" - this isn't a legacy issue, this is people actively and currently making choices that I don't understand.

clearly have many valid use cases

Help me understand those.

agents are super-flexible, deciding to not drag along the weird legacy system is as good a reason as any to fire up a new related agent

This flies directly in the face of providing attribution. Two agents that are the same organization is a bad idea in my opinion. In some cases there IS NOT a new agent. Both numbers were issued by the same collection.

FWIW, I am not too perturbed by an agent that is a collection issuing identifiers other than Arctos record GUIDs. Lots of people issue both collector and preparator numbers and we don't care about that. Maybe I'm in the minority though, so I'll let the rest of anyone who cares chime in.

I agree with @Jegelewicz . This happens frequently, and there is a need. If we disallow it, then data will be lost and be stashed in even more obscure places as people invent obscure work arounds. Data need to be discoverable, but the database needs to accommodate the realities of current practices as well as legacy data. The alternative is something like what happened at USNM when they recataloged all the USDA parasite type collection with all the type specimen published catalog numbers under new catalog numbers without making the original values readily visible and discoverable. That is a far worse crime IMO.
Happy to hear from others.

The alternative

At least one of us is completely lost! The alternative to what I'm proposing is avoiding #7025, where inconsistent entry has made a mess, by requiring consistency.

There are still no legacy data involved in this.

I suppose I'll take your word that 4 (https://arctos.database.museum/search.cfm?id_issuedby=MSB&oidtype=NK) out of 243,868 NK need to be issued by a particular agent, but I don't understand it...

This does not and can not have anything to do with recataloging?!

Would it be possible to have a screen shot of the new system from test please? I'm afraid I am not sure what was wrong with @Jegelewicz's example of a problem:
image

I'm afraid I am not sure what was wrong with @Jegelewicz's example of a problem:

UTEP:Herp:10015 is an Arctos record identifier. Because I selected the ID Type identifier, it is pretty useless. If I selected ID Type Arctos record GUID, that would have been transformed magically by Arctos to https://arctos.database.museum/guid/UTEP:Herp:10015, the issued by would have been magically entered as University of Texas at El Paso Amphibian and Reptile Collection and there would be an active link from the record I entered this in to https://arctos.database.museum/guid/UTEP:Herp:10015. Because I selected the wrong type, all that magic is missing!

Because I selected the wrong type

Or, from my/Arctos' perspective: You (presumably!) selected precisely what you wanted, that's not an Arctos identifier at all, anyone can mint such strings for any purpose, if it was what we're all assuming you'd have used the 'magic, please' option we're adding here. This ties into #5310 - our 'globe' is now truly global (it wasn't when this confusing discussion began, proto-Arctos was on a big purple box in the basement of UAF!) and not being explicit in that provides opportunities to mangle data. No matter what this new type will do, feeding it https://arctos.database.museum/guid/UTEP:Herp:10015 when you mean https://arctos.database.museum/guid/UTEP:Herp:10015 will always be the proper course of action; anything else will leave me (and everyone who comes after) guessing.

screen shot

You posted one; this is just a new type and some rules (#7808 (comment))

You posted one;

I reposted @Jegelewicz's screenshot. I was hoping to a screenshot of the what a correct ID was so I can could compare, but Teresa walked me through it.

So this is going to be the difference between I want to use the Arctos relationship magic, or I don't want to use it for other Arctos collections? Because I agree with @dustymc that UTEP:Herp:10015 is an identifier, and I don't think it is wrong to call it an Identifier. It just was not using the new magic Arctos Record GUID identifier.

difference between I want to use the Arctos relationship magic,

Not precisely.

If you want to (vaguely, in a way that can't actually connect) refer to something out there, do what @Jegelewicz did. If UTEP:Herp:10015 is the catalog number of a piece of artwork which isn't available in some GUID-ish way online, then this (plus an issued by agent) is the proper entry.

If you're aiming for https://arctos.database.museum/guid/UTEP:Herp:10015, then...

Screenshot 2024-05-31 at 08 59 28

is correct.

If you insist on using triplets for some crazy reason,

Screenshot 2024-05-31 at 09 01 57

will get magicked into https://arctos.database.museum/guid/UTEP:Herp:10015 (and a proper issued by agent added) when the record is created.

This:

Screenshot 2024-05-31 at 09 02 12

will in all cases throw an error.

Thanks for the explanations @dustymc and @Jegelewicz. I understand a bit more now.

@Jegelewicz please confirm that this is them:

temp_ALMNHBirdicn.csv.zip

@dustymc All ALMNH:Geo records with institutional catalog number that starts with PI

Change to
type = identifier
issued by = https://arctos.database.museum/agent/21347728
remark = accession number

please confirm that this is them:

it is, let's make the remark just accession number

@Jegelewicz re #7808 (comment) I find only one

 ALMNH:Geo   | ALMNH:Geo:1 | https://arctos.database.museum/guid/ALMNH:Geo:1 | institutional catalog number | PI1985.0027.0048 | self          | unknown        | 2022-04-27    | Alabama Museum of Natural History Geology Collection |         |                 16083085

I find only one

That's because the rest are like this

image

So those starting with G too

UPDATE 983

(And I'm going to hide the done comments lest I get lost in them.)

G too

Confirm please:

temp_ALMNHGeoicnpig.csv.zip

@dustymc for all ALMNH:Geo and ALMNH:Paleo like this

image

I've made an agent we could use if it would work better - https://arctos.database.museum/agent/21352846

like this

That will get auto-cleaned as part of this new type.

if it would work better

I can use whatever. I suspect keeping collection agents (and maybe some others, like genbank) "clean" is worth doing, but maybe that's not something we're interested in, or not globally, or ???????????

@dustymc for MSB:Host like this

image

Remove the duplicate MSB:Para please.

@dustymc for OWU:Fish like this

image

remove the extra : please

@dustymc I have done everything I can for the data here - #7808 (comment) so maybe redo that and put it in a Google sheet?

DLM: done

extra : please

Link please? I'm juggling too many things to transcribe from an image....

EDIT nevermind I just nuked all 4 %::%

@dustymc for

ACUNHC:Bird
ACUNHC:Ento
ACUNHC:Fish
ACUNHC:Herp
ACUNHC:Inv
ACUNHC:Paleo

identifiers of type institutional catalog number that start with ACUNHC and include issued by Abilene Christian University, Natural History Collection, type can be changed to identifier.

DLM

I'm editing your posts in further efforts to not confuse myself.

Data:
temp_ACUNHClottaguids.csv.zip

UPDATE 3128

@dustymc can you put the GUIDs involved in #7808 (comment) in a Google sheet?

GUIDs involved in #7808 (comment) in a Google sheet?

Not really - those are post-update and I can't easily get at those data until there's an update. That comment was mostly exploratory - does anyone care? If there's any interest, open a new issue for that and I'll get data when I can. (Or I can get test data, but that's probably just distracting - eg it'll contain all the stuff you just fixed.)

@dustymc I can't get search results because reasons, but the CHAS:Herb records in https://docs.google.com/spreadsheets/d/1rNlvoWnDrWeVrxD2nWsMAaFO8x3fmBDQ445sWc8DcGQ/edit#gid=1128266536 with relationship other than self that start with CHAS:Herb: should have identifier values prefixed with https://arctos.database.museum/guid/ and issued by Chicago Academy of Sciences Herbarium

An example

https://arctos.database.museum/guid/CHAS:Herb:1878.17.2747

There are also some that are missing the CHAS:Herb, but clearly have the same issue.

https://arctos.database.museum/guid/CHAS:Herb:1873.48.412

Can you find and fix these? I don't have access to the collection, so I am unable to clean up. Note that there are also issues like this in

CHAS:Bird (just two)
CHAS:Teach

Blargh, that's clearly more of #7836, I don't think we're taking https://github.com/ArctosDB/arctos/issues?q=is%3Aissue+is%3Aopen+label%3A%22Priority+-+Wildfire+Potential%22 seriously enough.

CHAS:Herb records in .. with relationship other than self that start with CHAS:Herb: should have identifier values prefixed with https://arctos.database.museum/guid/ and issued by Chicago Academy of Sciences Herbarium

temp_chasherbsr.csv.zip

UPDATE 117

I clicked one and got...

Screenshot 2024-06-04 at 11 36 06

which this type will prevent, so that's nice.

It was the only one, updated manually.

clearly have the same issue.

I'm not brave enough to go looking for something that vague by myself, but happy to help if someone from the collection wants to clean up.

Is that the list you want me to confirm changes for?

The bottom three here should be the full CHAS:Herb url instead of just the triplet. If that is going to be magically fixed later somehow, great, otherwise, the list you posted above all need to have the same treatment.

not brave enough

I can send you lists if that will help.

image

confirm

I just did it, seemed unambiguous and there's pre-update CSV here, I can stop if anyone thinks I'm getting too brave...

send you lists

More information about always makes me happy!

bottom three here

Aiya, #7836 again, I filtered for institutional catalog number and some of the clearly-same data are something else because we've locked ourselves into a confusing environment. I'll check back in after I find some food and air, but endlessly bashing our head on symptoms with no ability to address the disease is feeling awfully depressing at the moment.

I'm just trying to fix known problems as much as possible so that we can get to something manageable? I also need to eat though....

so that we can get to something manageable

OK, fair enough, and thanks.

IDK what to do with this - probably nothing until we've got the good stuff picked out - but https://docs.google.com/spreadsheets/d/10vZuf74wMiGqOE92HBISj6gssKwvAqWLZUHF3WDoO40/edit#gid=758636242 is everything that looks like it might be a guid but can't be made to resolve.

All of the ALMNH:ES DO resolve - try it yourself http://arctos.database.museum/guid/ALMNH:ES:1325

These are working exactly as intended, I just think we should change the issued by to https://arctos.database.museum/agent/21352846

Apologies for being AWOL - I can try to return to this now and help as I can.
Just took a quick look at the spreadsheet https://docs.google.com/spreadsheets/d/10vZuf74wMiGqOE92HBISj6gssKwvAqWLZUHF3WDoO40/edit#gid=758636242.
The DGR catalog numbers were largely encumbered or deleted as part of mergers with respective voucher collections, although some unmerged valid ones still exist (e.g. they couldn't be or weren't assigned other guids). Most of these DGR triplets are old catalog numbers that need to be preserved but should be recorded as institutional catalog numbers issued by the MSB Division of Genomic Resources as "DGR:Mamm", "DGR:Bird" etc.
But we now have a new MSB:DGR collection that is also issuing real guids/urls that will be issued by MSB Division of Genomic Resources. See for example https://arctos.database.museum/guid/MSB:DGR:100.
Based on the previous discussion, how are these to be handled?

DO resolve

Well not directly, and I didn't get very fancy with this. The big groups of things (those, maybe a lot of DRG stuff) should be easy - drop 'em, not a problem for here. (And I think maybe you mapped something else that I did at test, but I haven't started on prod yet.) It'll get easier to drop after this issue is completed.

I'm a little hesitant to do much with agents until #7837 gets some resolution, and I think those are easy to find and update anytime.

I am in the process of fixing all of the ASUMZ Bivalve lots and the ALMNH Mamm M numbers, that will clean up a big chunk.

What is the plan for stuff that just hasn't been cataloged yet? See https://arctos.database.museum/guid/NMU:Mamm:2837

It appears that a couple of the related parasites just haven't been cataloged yet. I think it would be bad to remove them, but that also leaves an opening for people to make things up?

What is the plan for stuff that just hasn't been cataloged yet? See https://arctos.database.museum/guid/NMU:Mamm:2837

It appears that a couple of the related parasites just haven't been cataloged yet. I think it would be bad to remove them, but that also leaves an opening for people to make things up?

We absolutely need to leave the option open for related material to be cataloged at different times. There is no way to enforce that both related objects must be cataloged prior to relationships link being formed. This will result in huge data loss. Sometimes there are years between cataloging of related items, and these are frequently linked across institutions and outside the permissions of any single operator. This needs to allow for flexibility across institutional staffing resources and workflows. Please do not jeopardize ongoing efforts between multiple Arctos collections and institutions which are capturing these data as we speak.

@dustymc in https://docs.google.com/spreadsheets/d/10vZuf74wMiGqOE92HBISj6gssKwvAqWLZUHF3WDoO40/edit#gid=758636242

There are 635 rows that are like this

GUID_PREFIX TRIPLET OTHER_ID_TYPE DISPLAY_VALUE ID_REFERENCES ASSIGNED_AGENT ASSIGNED_DATE ISSUED_BY_AGENT REMARKS
MSB:Mamm MSB:Mamm:194205 institutional catalog number MSB:Mamm:143989 same individual as Jonathan L. Dunnum {ts '2023-05-16 00:00:00'}   These specimens were double cataloged due to the erroneus understanding that the Panamanian NK 200000 series represented unique individuals from the NK 117000 series.

Note the remark

These specimens were double cataloged due to the erroneus understanding that the Panamanian NK 200000 series represented unique individuals from the NK 117000 series.

So, all of these SHOULD be guids. Can you add on the https://arctos.database.museum/guid/ to make them work? We can talk about what else needs to be done with them some other time. Here are the triplets for the records

MSB:Mamm:194294,MSB:Mamm:194333,MSB:Mamm:194850,MSB:Mamm:194439,MSB:Mamm:194562,MSB:Mamm:194264,MSB:Mamm:194395,MSB:Mamm:194379,MSB:Mamm:194486,MSB:Mamm:194685,MSB:Mamm:194867,MSB:Mamm:194788,MSB:Mamm:194300,MSB:Mamm:194317,MSB:Mamm:194766,MSB:Mamm:194542,MSB:Mamm:194223,MSB:Mamm:194435,MSB:Mamm:194350,MSB:Mamm:194206,MSB:Mamm:194607,MSB:Mamm:194252,MSB:Mamm:194676,MSB:Mamm:194547,MSB:Mamm:194511,MSB:Mamm:194597,MSB:Mamm:194591,MSB:Mamm:194432,MSB:Mamm:194354,MSB:Mamm:194329,MSB:Mamm:194448,MSB:Mamm:194697,MSB:Mamm:194842,MSB:Mamm:194382,MSB:Mamm:194291,MSB:Mamm:194755,MSB:Mamm:194578,MSB:Mamm:194557,MSB:Mamm:194518,MSB:Mamm:194841,MSB:Mamm:194509,MSB:Mamm:194701,MSB:Mamm:194531,MSB:Mamm:194834,MSB:Mamm:194299,MSB:Mamm:194552,MSB:Mamm:194308,MSB:Mamm:194693,MSB:Mamm:194848,MSB:Mamm:194309,MSB:Mamm:194476,MSB:Mamm:194494,MSB:Mamm:194335,MSB:Mamm:194660,MSB:Mamm:194189,MSB:Mamm:194659,MSB:Mamm:194472,MSB:Mamm:194241,MSB:Mamm:221893,MSB:Mamm:194328,MSB:Mamm:194380,MSB:Mamm:194172,MSB:Mamm:194751,MSB:Mamm:194473,MSB:Mamm:194237,MSB:Mamm:194366,MSB:Mamm:194394,MSB:Mamm:194407,MSB:Mamm:194330,MSB:Mamm:194839,MSB:Mamm:194238,MSB:Mamm:194560,MSB:Mamm:194474,MSB:Mamm:194589,MSB:Mamm:194198,MSB:Mamm:194272,MSB:Mamm:194775,MSB:Mamm:194174,MSB:Mamm:194378,MSB:Mamm:194777,MSB:Mamm:194285,MSB:Mamm:194687,MSB:Mamm:194792,MSB:Mamm:194412,MSB:Mamm:194196,MSB:Mamm:194671,MSB:Mamm:194386,MSB:Mamm:194653,MSB:Mamm:194193,MSB:Mamm:194229,MSB:Mamm:221894,MSB:Mamm:194351,MSB:Mamm:194453,MSB:Mamm:194822,MSB:Mamm:194266,MSB:Mamm:194779,MSB:Mamm:194444,MSB:Mamm:194576,MSB:Mamm:194405,MSB:Mamm:194347,MSB:Mamm:194466,MSB:Mamm:194602,MSB:Mamm:194679,MSB:Mamm:194275,MSB:Mamm:194431,MSB:Mamm:194858,MSB:Mamm:194297,MSB:Mamm:194795,MSB:Mamm:194500,MSB:Mamm:194588,MSB:Mamm:194173,MSB:Mamm:194798,MSB:Mamm:194785,MSB:Mamm:194355,MSB:Mamm:194221,MSB:Mamm:194764,MSB:Mamm:194522,MSB:Mamm:194771,MSB:Mamm:194541,MSB:Mamm:194321,MSB:Mamm:194463,MSB:Mamm:194213,MSB:Mamm:194669,MSB:Mamm:194615,MSB:Mamm:194384,MSB:Mamm:194204,MSB:Mamm:194346,MSB:Mamm:194416,MSB:Mamm:194406,MSB:Mamm:194756,MSB:Mamm:194381,MSB:Mamm:194498,MSB:Mamm:194176,MSB:Mamm:194455,MSB:Mamm:194225,MSB:Mamm:194688,MSB:Mamm:194610,MSB:Mamm:194612,MSB:Mamm:194402,MSB:Mamm:194689,MSB:Mamm:194750,MSB:Mamm:194310,MSB:Mamm:194287,MSB:Mamm:194504,MSB:Mamm:194185,MSB:Mamm:194833,MSB:Mamm:194271,MSB:Mamm:194224,MSB:Mamm:194800,MSB:Mamm:194608,MSB:Mamm:194457,MSB:Mamm:194278,MSB:Mamm:194565,MSB:Mamm:194377,MSB:Mamm:194787,MSB:Mamm:194325,MSB:Mamm:194646,MSB:Mamm:194734,MSB:Mamm:194220,MSB:Mamm:194175,MSB:Mamm:194251,MSB:Mamm:194440,MSB:Mamm:194829,MSB:Mamm:194661,MSB:Mamm:194528,MSB:Mamm:194760,MSB:Mamm:194210,MSB:Mamm:194695,MSB:Mamm:194651,MSB:Mamm:194830,MSB:Mamm:194451,MSB:Mamm:194211,MSB:Mamm:194824,MSB:Mamm:194804,MSB:Mamm:194202,MSB:Mamm:194529,MSB:Mamm:194857,MSB:Mamm:194835,MSB:Mamm:194423,MSB:Mamm:194332,MSB:Mamm:194843,MSB:Mamm:194863,MSB:Mamm:194491,MSB:Mamm:194780,MSB:Mamm:194544,MSB:Mamm:194401,MSB:Mamm:194614,MSB:Mamm:194561,MSB:Mamm:194464,MSB:Mamm:194255,MSB:Mamm:194450,MSB:Mamm:194592,MSB:Mamm:194762,MSB:Mamm:194205,MSB:Mamm:194820,MSB:Mamm:194686,MSB:Mamm:221892,MSB:Mamm:194731,MSB:Mamm:194813,MSB:Mamm:194334,MSB:Mamm:194595,MSB:Mamm:194302,MSB:Mamm:194227,MSB:Mamm:194601,MSB:Mamm:194258,MSB:Mamm:194768,MSB:Mamm:194885,MSB:Mamm:194415,MSB:Mamm:194546,MSB:Mamm:194704,MSB:Mamm:194280,MSB:Mamm:194501,MSB:Mamm:194318,MSB:Mamm:194778,MSB:Mamm:194539,MSB:Mamm:194460,MSB:Mamm:194855,MSB:Mamm:194650,MSB:Mamm:194678,MSB:Mamm:194819,MSB:Mamm:194846,MSB:Mamm:194319,MSB:Mamm:194840,MSB:Mamm:194298,MSB:Mamm:194643,MSB:Mamm:194454,MSB:Mamm:194283,MSB:Mamm:194844,MSB:Mamm:194851,MSB:Mamm:194545,MSB:Mamm:194215,MSB:Mamm:194806,MSB:Mamm:194357,MSB:Mamm:194536,MSB:Mamm:194826,MSB:Mamm:194417,MSB:Mamm:194290,MSB:Mamm:194362,MSB:Mamm:194433,MSB:Mamm:194477,MSB:Mamm:194746,MSB:Mamm:194187,MSB:Mamm:194513,MSB:Mamm:194324,MSB:Mamm:194372,MSB:Mamm:194752,MSB:Mamm:194279,MSB:Mamm:194741,MSB:Mamm:194663,MSB:Mamm:194609,MSB:Mamm:194758,MSB:Mamm:194458,MSB:Mamm:194730,MSB:Mamm:194392,MSB:Mamm:194748,MSB:Mamm:194603,MSB:Mamm:194327,MSB:Mamm:194462,MSB:Mamm:194446,MSB:Mamm:194584,MSB:Mamm:194403,MSB:Mamm:194186,MSB:Mamm:194288,MSB:Mamm:194767,MSB:Mamm:194487,MSB:Mamm:194277,MSB:Mamm:194645,MSB:Mamm:194807,MSB:Mamm:194344,MSB:Mamm:194320,MSB:Mamm:194587,MSB:Mamm:194422,MSB:Mamm:194827,MSB:Mamm:194393,MSB:Mamm:194506,MSB:Mamm:194543,MSB:Mamm:194404,MSB:Mamm:194515,MSB:Mamm:194680,MSB:Mamm:194774,MSB:Mamm:194860,MSB:Mamm:194499,MSB:Mamm:194301,MSB:Mamm:194482,MSB:Mamm:194419,MSB:Mamm:194516,MSB:Mamm:194808,MSB:Mamm:194551,MSB:Mamm:194563,MSB:Mamm:194526,MSB:Mamm:194818,MSB:Mamm:194248,MSB:Mamm:194201,MSB:Mamm:194375,MSB:Mamm:194575,MSB:Mamm:194260,MSB:Mamm:194296,MSB:Mamm:194594,MSB:Mamm:194869,MSB:Mamm:194191,MSB:Mamm:194825,MSB:Mamm:194234,MSB:Mamm:194443,MSB:Mamm:194409,MSB:Mamm:194647,MSB:Mamm:194274,MSB:Mamm:263803,MSB:Mamm:221891,MSB:Mamm:194757,MSB:Mamm:194868,MSB:Mamm:194356,MSB:Mamm:194866,MSB:Mamm:194342,MSB:Mamm:194782,MSB:Mamm:194216,MSB:Mamm:194323,MSB:Mamm:194358,MSB:Mamm:194533,MSB:Mamm:194180,MSB:Mamm:194312,MSB:Mamm:194441,MSB:Mamm:194359,MSB:Mamm:194483,MSB:Mamm:194884,MSB:Mamm:194761,MSB:Mamm:194880,MSB:Mamm:194828,MSB:Mamm:194810,MSB:Mamm:194188,MSB:Mamm:194268,MSB:Mamm:194361,MSB:Mamm:194233,MSB:Mamm:194447,MSB:Mamm:194519,MSB:Mamm:194322,MSB:Mamm:194249,MSB:Mamm:194532,MSB:Mamm:194442,MSB:Mamm:194613,MSB:Mamm:194832,MSB:Mamm:194853,MSB:Mamm:194583,MSB:Mamm:194396,MSB:Mamm:194579,MSB:Mamm:194805,MSB:Mamm:194434,MSB:Mamm:194740,MSB:Mamm:194232,MSB:Mamm:194605,MSB:Mamm:194374,MSB:Mamm:194847,MSB:Mamm:194340,MSB:Mamm:194326,MSB:Mamm:194243,MSB:Mamm:194865,MSB:Mamm:194566,MSB:Mamm:194203,MSB:Mamm:194467,MSB:Mamm:194281,MSB:Mamm:194295,MSB:Mamm:194789,MSB:Mamm:194759,MSB:Mamm:194535,MSB:Mamm:194170,MSB:Mamm:194649,MSB:Mamm:194523,MSB:Mamm:194481,MSB:Mamm:194177,MSB:Mamm:194596,MSB:Mamm:194303,MSB:Mamm:194736,MSB:Mamm:194336,MSB:Mamm:194286,MSB:Mamm:194729,MSB:Mamm:194654,MSB:Mamm:194530,MSB:Mamm:194410,MSB:Mamm:194496,MSB:Mamm:194772,MSB:Mamm:194510,MSB:Mamm:194881,MSB:Mamm:194698,MSB:Mamm:194456,MSB:Mamm:194503,MSB:Mamm:194305,MSB:Mamm:194389,MSB:Mamm:194799,MSB:Mamm:194250,MSB:Mamm:194616,MSB:Mamm:194436,MSB:Mamm:194556,MSB:Mamm:194801,MSB:Mamm:194549,MSB:Mamm:194667,MSB:Mamm:194817,MSB:Mamm:194534,MSB:Mamm:194184,MSB:Mamm:194538,MSB:Mamm:194373,MSB:Mamm:194493,MSB:Mamm:194199,MSB:Mamm:194424,MSB:Mamm:194781,MSB:Mamm:194728,MSB:Mamm:194306,MSB:Mamm:194212,MSB:Mamm:194387,MSB:Mamm:194593,MSB:Mamm:194270,MSB:Mamm:194265,MSB:Mamm:194598,MSB:Mamm:194585,MSB:Mamm:194691,MSB:Mamm:194388,MSB:Mamm:194821,MSB:Mamm:194360,MSB:Mamm:194553,MSB:Mamm:194427,MSB:Mamm:194475,MSB:Mamm:194339,MSB:Mamm:194408,MSB:Mamm:194823,MSB:Mamm:194861,MSB:Mamm:194214,MSB:Mamm:194426,MSB:Mamm:194369,MSB:Mamm:194311,MSB:Mamm:194747,MSB:Mamm:194307,MSB:Mamm:194183,MSB:Mamm:194664,MSB:Mamm:194452,MSB:Mamm:194257,MSB:Mamm:194397,MSB:Mamm:194505,MSB:Mamm:194276,MSB:Mamm:194418,MSB:Mamm:194479,MSB:Mamm:194383,MSB:Mamm:194371,MSB:Mamm:194773,MSB:Mamm:194236,MSB:Mamm:194353,MSB:Mamm:194507,MSB:Mamm:194267,MSB:Mamm:194882,MSB:Mamm:194838,MSB:Mamm:194478,MSB:Mamm:194600,MSB:Mamm:194345,MSB:Mamm:194197,MSB:Mamm:194814,MSB:Mamm:194178,MSB:Mamm:194699,MSB:Mamm:194331,MSB:Mamm:194420,MSB:Mamm:194540,MSB:Mamm:194683,MSB:Mamm:194662,MSB:Mamm:194738,MSB:Mamm:194527,MSB:Mamm:194245,MSB:Mamm:194577,MSB:Mamm:194437,MSB:Mamm:194786,MSB:Mamm:194438,MSB:Mamm:194580,MSB:Mamm:194784,MSB:Mamm:194733,MSB:Mamm:194590,MSB:Mamm:194391,MSB:Mamm:194228,MSB:Mamm:194304,MSB:Mamm:194231,MSB:Mamm:194739,MSB:Mamm:194702,MSB:Mamm:194815,MSB:Mamm:194263,MSB:Mamm:194604,MSB:Mamm:194222,MSB:Mamm:194489,MSB:Mamm:194208,MSB:Mamm:194854,MSB:Mamm:194425,MSB:Mamm:194599,MSB:Mamm:194677,MSB:Mamm:194182,MSB:Mamm:194192,MSB:Mamm:194521,MSB:Mamm:194744,MSB:Mamm:194235,MSB:Mamm:194508,MSB:Mamm:194239,MSB:Mamm:194732,MSB:Mamm:194262,MSB:Mamm:194364,MSB:Mamm:194879,MSB:Mamm:194492,MSB:Mamm:194684,MSB:Mamm:194666,MSB:Mamm:194247,MSB:Mamm:194365,MSB:Mamm:194461,MSB:Mamm:194414,MSB:Mamm:194524,MSB:Mamm:194849,MSB:Mamm:194753,MSB:Mamm:194343,MSB:Mamm:194413,MSB:Mamm:194259,MSB:Mamm:194537,MSB:Mamm:194256,MSB:Mamm:194743,MSB:Mamm:194399,MSB:Mamm:194468,MSB:Mamm:194690,MSB:Mamm:194376,MSB:Mamm:194836,MSB:Mamm:194341,MSB:Mamm:194770,MSB:Mamm:194692,MSB:Mamm:194338,MSB:Mamm:194569,MSB:Mamm:194240,MSB:Mamm:194657,MSB:Mamm:194282,MSB:Mamm:194856,MSB:Mamm:194883,MSB:Mamm:194811,MSB:Mamm:194776,MSB:Mamm:194791,MSB:Mamm:194254,MSB:Mamm:194246,MSB:Mamm:194181,MSB:Mamm:194796,MSB:Mamm:194269,MSB:Mamm:194497,MSB:Mamm:194179,MSB:Mamm:194273,MSB:Mamm:194783,MSB:Mamm:194763,MSB:Mamm:194809,MSB:Mamm:194749,MSB:Mamm:194837,MSB:Mamm:194207,MSB:Mamm:194703,MSB:Mamm:194586,MSB:Mamm:194289,MSB:Mamm:194195,MSB:Mamm:194226,MSB:Mamm:194831,MSB:Mamm:194368,MSB:Mamm:194411,MSB:Mamm:194171,MSB:Mamm:194665,MSB:Mamm:194217,MSB:Mamm:194845,MSB:Mamm:194200,MSB:Mamm:194465,MSB:Mamm:194490,MSB:Mamm:194370,MSB:Mamm:194606,MSB:Mamm:194169,MSB:Mamm:194765,MSB:Mamm:194559,MSB:Mamm:194812,MSB:Mamm:194700,MSB:Mamm:194797,MSB:Mamm:194859,MSB:Mamm:194737,MSB:Mamm:194803,MSB:Mamm:194611,MSB:Mamm:194852,MSB:Mamm:194348,MSB:Mamm:194512,MSB:Mamm:194449,MSB:Mamm:194190,MSB:Mamm:194769,MSB:Mamm:194670,MSB:Mamm:194681,MSB:Mamm:194644,MSB:Mamm:194548,MSB:Mamm:194471,MSB:Mamm:194385,MSB:Mamm:194244,MSB:Mamm:194242,MSB:Mamm:194816,MSB:Mamm:194337,MSB:Mamm:194742,MSB:Mamm:194727,MSB:Mamm:194349,MSB:Mamm:194864,MSB:Mamm:194682,MSB:Mamm:194862,MSB:Mamm:194550,MSB:Mamm:194484,MSB:Mamm:194790,MSB:Mamm:194293,MSB:Mamm:194219,MSB:Mamm:194428,MSB:Mamm:194567,MSB:Mamm:194292,MSB:Mamm:194488,MSB:Mamm:194421,MSB:Mamm:194495,MSB:Mamm:194485,MSB:Mamm:194352,MSB:Mamm:194194,MSB:Mamm:194696,MSB:Mamm:194514,MSB:Mamm:194390,MSB:Mamm:194802,MSB:Mamm:194502,MSB:Mamm:194793,MSB:Mamm:194568,MSB:Mamm:194517,MSB:Mamm:194261,MSB:Mamm:194209,MSB:Mamm:194656,MSB:Mamm:194367,MSB:Mamm:194525,MSB:Mamm:194745,MSB:Mamm:194218,MSB:Mamm:194284,MSB:Mamm:194230,MSB:Mamm:194558,MSB:Mamm:194655,MSB:Mamm:194652,MSB:Mamm:194694,MSB:Mamm:194564,MSB:Mamm:194398,MSB:Mamm:194480,MSB:Mamm:194754,MSB:Mamm:194400,MSB:Mamm:194253