
UCM Agent Bulkload Request

Closed this issue · 18 comments

cf_temp_pre_bulk_agent_download_version ready.csv
Please bulkload the agents in the attached file.

Note: The file should be results from the Agent Prebulkload Tool. If the file is too large for Github attachments, comment here and an email address or shared file space will be provided to you.

S-C Lee
C-C Chen
C-P Chen
J-T Chao

and others with a dash in preferred name. First names should not include punctuation other than a period. Are we sure these are people and should they be:

S. C. Lee
C. C. Chen
C. P. Chen
J. T. Chao


R/V Soyo-Maru

Is not a person, but a research vessel? If so, this may be added as an organization.

A. C. Burrill
R. C. Burrill

This really feels like someone somewhere mis-transcribed an A for an R or the other way around?

W. F. Halliday
W. R. Halliday

Ditto for the F and R in these two

Mr. A. E. Collins
Mrs. A. E. Collins

And the D and K here

D. A. Han
K. A. Han

Add the "spouse of" relationship between these after they are added?

Will Eberle-Taylor
Nick Eberle-Taylor
Quinn Eberle-Taylor

I assume these people are related? Do we know how?

Can I be convinced that these are really not the same person?

William W. Hay
W. Hay

Or these two?

Norman E. A. Hinds
Norman E. C. Hinds

All of the "not the same as" relationships require a method and determiner.

I am not trying to be obstructionist, but it seems like there is still some cleanup that could be done before we add these agents? I stopped looking at the near matches, so there are probably others I would add to the categories above.

No worries. Thanks for catching those. Updated agent list attached:
cf_temp_pre_bulk_agent_download_final version.csv

@dustymc Thanks for including me in the #7649 issue. Maybe we should pair our list down so that the only agents that get uploaded are ones that have full names (i.e., no initial) or have one (or more) attribute that distinguishes them (makes them unique) from other agents? So, for instance if we have a J. Smith the only way we can upload that person as an agent is if we had an attribute, say "child of", linked to that agent. Would that work?

So, for instance if we have a J. Smith the only way we can upload that person as an agent is if we had an attribute, say "child of", linked to that agent. Would that work?

That will help, but the ones I am struggling with include things like

Barbara Waleis which feels like it may be a mistranscription of Barbara T. Waters

Charles A. Nelson feels like a mistranscription of Charles D. Nelson (or perhaps it is the other way around, A and D can look very similar when written or maybe these ARE two different people, but I have no way to decide that)

Chin-Tsong Lewis and Chin-Tsong Lo - one of these must be a misspelling, an alternate name for the same person, or are they related people?

You may have no way to figure out if my "feelings" are justified, but if you do, it might be good to get things like this sorted before making agents.

As before, I did not peruse the entire list to look for these internal issues, but there are probably others! Do not take this as a summary of everything that I think needs review - just ideas for looking at the data you have in-house even before comparisons to Arctos agents.

Barbara Waleis which feels like it may be a mistranscription of Barbara T. Waters

I can confirm that Barbara Waleis and Barbara T. Waters are two different people. Waleis is a collector from the 1930s, while Waters is a collector from the 1980s.

The others are all agents for the invert zoo collection, which will need to be checked by @Krmartin3 when she gets back from vacation. I can say that the Chinese do use hyphenated first names. So, Arctos may need to figure that one out, but I'll let Kelly chime in when she is back.

Charles A. Nelson feels like a mistranscription of Charles D. Nelson (or perhaps it is the other way around, A and D can look very similar when written or maybe these ARE two different people, but I have no way to decide that)

Chin-Tsong Lewis and Chin-Tsong Lo - one of these must be a misspelling, an alternate name for the same person, or are they related people?

In the mean time, I'm going to pull all of invert zoo's agents from the sheet, as I think most of the issues are coming from that side (sorry Kelly). I'll reupload a new sheet of agents here in a bit.

@javanveldhuizen the dates in that CSV have been mangled (probably by Excel?).

@dustymc Interesting, the dates look fine on my end.
Screenshot 2024-07-02 075148

Should I use a different program to edit the CSV instead?

@dustymc Ok. I edited the CSV using Notepad and changed all the dates into the desired format: yyyy-mm-dd. Let me know if that doesn't work.

cf_temp_pre_bulk_agent_vert paleo agents only.csv

look fine

Yea, but they don't SAVE fine (eg unambiguously), which is why we require CSV. (I wrote the 'eat your data' bits but not the niceties at the top!)

Thanks, I've got those in the pre-loader.

The first thing in my view is "Humboldt Museum" - surely that's or

The first thing in my view is "Humboldt Museum" - surely that's or

It's kind of actually neither of those things. The specimens I have tied to the Humboldt Museum were donated to us from a researcher at the Humboldt-Universität zu Berlin. What's unclear is whether these were actually part of the museum at that university, which later became the Museum fuer Naturkunde der Humboldt-Universitaet Berlin, or if they were a part of a researchers lab collection. I kept is Humboldt Museum until I could fully untangle it. Feel free to delete it from the list if you feel that it is not an appropriate true agent.

@dustymc Here is the agent sheet again with the Humboldt Museum removed
cf_temp_pre_bulk_agent_vert paleo agents only.csv

you feel

Ugh, that should not be the path, @ArctosDB/agents-committee HELP!

Lacking further guidance, that seems a somewhat defensible position to me (and a remark would be useful, if that's not already there).

I loaded data to

Again an "interesting" situation on the first line!

Screenshot 2024-07-02 at 08 58 39

First your agent will load, then Arctos will run....

arctosprod@arctos>> select getAgentID('David Taylor');

except two results will be returned - this one and the one just created - which will result in an error. Maybe that's somehow my problem, but I'm not quite sure how to address it. will always be unambiguous, but isn't great for humans to work with in a spreadsheet.

Beyond that, I don't know how to proceed. (I'd use verbatim agents as a first pass so we don't have to guess from strings, but I seem to have lost that argument!)

<style type="text/css"></style>

person Sarah E. Rieboldt attribute match: first+last variants Sarah Rieboldt person first name Sarah               middle name E.               last name Rieboldt               not the same as       Sarah Reiboldt 2024-07-01 Jacob Van Veldhuizen                                                                                                                 dlm      

<style type="text/css"></style>

person Bill Simpson attribute match: first+last variants William Simpson person first name Bill               last name Simpson               not the same as       William Simpson 2024-07-01 Jacob Van Veldhuizen                                                                                                                                   dlm      

<style type="text/css"></style>

organization Brigham Young University Museum of Paleontology attribute match: aka Brigham Young University Life Science Museum organization aka BYU               Wikidata               not the same as       Brigham Young University Life Science Museum 2024-07-01 Jacob Van Veldhuizen                                                                                                                                   dlm      

look pretty suspicious (and maybe that's OK, I don't know, this should still not be my call @ArctosDB/agents-committee !!)

I didn't scroll very far, just enough to grab a couple examples.

I don't see any super-obvious duplicates or mistyped agents or such in the file. I REALLY don't want this to be my call (see above, I'd do something entirely different!), and the ~30 flagged by the checker could definitely use careful review, but loading this doesn't seem unreasonable.

@Jegelewicz @mkoo thoughts??


arctosprod@arctos>> select getAgentID('David Taylor');


I have deleted David Taylor from my list and will make him a verbatim agent for now until that issue is fixed. I can confirm that the David Taylor already in Arctos is not the same David Taylor in my data.

person Sarah E. Rieboldt attribute match: first+last variants Sarah Rieboldt person first name Sarah               middle name E.               last name Rieboldt               not the same as       Sarah Reiboldt 2024-07-01 Jacob Van Veldhuizen                                                                                                                 dlm      

<style type="text/css"></style>

For some reason Sarah Reiboldt keeps reappearing in this list even though I keep deleting it. Anyway, I've deleted it once again and I can confirm that the Sarah Reiboldt already in Arctos is the same Sarah Reiboldt in my data.

person Bill Simpson attribute match: first+last variants William Simpson person first name Bill               last name Simpson               not the same as       William Simpson 2024-07-01 Jacob Van Veldhuizen                                                                                                                                   dlm      

<style type="text/css"></style>

The Bill Simpson I have in my data is an amateur collector in the Denver area and not the William Simpson already in Arctos. These are two separate people, as indicated by the "not the same as" attribute.

organization Brigham Young University Museum of Paleontology attribute match: aka Brigham Young University Life Science Museum organization aka BYU               Wikidata               not the same as       Brigham Young University Life Science Museum 2024-07-01 Jacob Van Veldhuizen                                                                                                                                   dlm

The BYU Museum of Paleontology and the BYU Life Science Museum are two different organizations. Here are their websites so you can confirm:

New list here:
cf_temp_pre_bulk_agent_vert paleo agents only.csv

David Taylor

You can also just create the agent manually (where everything involved IDs instead of strings).

as indicated by the "not the same as" attribute

Sorry, I didn't look very carefully (was aiming for general considerations, not specifics!), thanks!

New list


I suppose I should just load that??? @mkoo

@javanveldhuizen I found a problem on my end and am rolling a partial load back, but during that I noticed

Ward Scientific
Wards National Science

in these data. Surely those are both duplicates of

@dustymc I deleted those agents. They need some verification. New list here:
cf_temp_pre_bulk_agent_vert paleo agents only.csv

Done and blamed on you @javanveldhuizen

There's one full-duplicate low-data copy of another low-data agent that maybe ought to have something done with it.

 agent_id | agent_type | preferred_agent_name |       creator        |        created_date        
 21354938 | person     | Scott Parker         | Jacob Van Veldhuizen | 2024-07-03 14:40:17.101114
 21257771 | person     | Scott Parker         | unknown              | 2013-12-16 21:49:31
(2 rows)

and one that errored out
