HTR-United/htr-united

Contact Kim Pham - JCRS 2020

PonteIneptique opened this issue · 10 comments

Ok trouvée sur github aussi mais toujours pas d'adresse email: @kimpham54

Dear Kim, in case you get a notification and read this, we are just trying to reach out and get in touch to know if you'd be okay to document the wonderful dataset that is on Zenodo through https://htr-united.github.io/document-your-data.html ? :)

hi @PonteIneptique, sorry I just saw this message!

Yes, that should be fine, but is it alright if you fill out a Terms of Use form?

also feel free to get in touch with my github username at gmail

Hi @kimpham54 !
We are not specifically looking for using your dataset (at least I am not, or not right now), but we are trying to catalog as much open data as possible for HTR United ( https://htr-united.github.io/catalog.html ), and that's why we are trying to contact people behind datasets :)
I could be interested in having access to the dataset but mostly to compute metrics (number of chars, lines, regions, files) to make cataloging "better". I could definitely sign a Terms of Use for this purpose :)

IF you have time, would you be willing to submit this form to document this dataset ? There is a very simple form there: https://htr-united.github.io/document-your-data.html

htr-united.zip
Attached are the metadata files plus terms of use form that you can include in the data files. Feel free to also sign the terms of use. Thank you!

Thank you !
We'll most likely only publish the open dataset ( https://doi.org/10.5281/zenodo.4242885 ) as it is the only one containing only ground truth. Would it be ok for you if however, I'd change:

description: Training and validation set. Transcribed records available upon request.

to

description: Training and validation set. Transcribed records ( https://doi.org/10.5281/zenodo.4150880 ) available upon request: access of the transcribed dataset is mediated upon filling out a terms of use.  https://specialcollections.du.edu/cad/form/termsOfUse. Contact author for more details. 

We could also add part of the form you sent, specifically:

The transcribed corpus of records from the Jewish Consumptive Relief Society contains data that include
individually identifiable health information, among other sensitive information regarding persons and people.
All individuals for whom records are provided have been deceased for at least 70 years, but were they still living
today, these records would be recognized as being protected health information under the US Health Insurance
Portability and Accountability Act of 1996 (HIPAA).
While HIPPA and other privacy laws no longer apply to these individuals, in providing these data the University
of Denver wishes to foster research practices that express the utmost respect for the human beings whose lives
are represented, at least in some part, in these collections. In addition, we ask researchers respect the lives of
these individuals’ ancestors and their communities.
To foster practices that honor patients, staff, nurses and physicians connected with the JCRS Sanitorium, as well
as their families, ancestors and communities, we ask that researchers disclose their intended use of the collection
for review by our Advisory Board (see reverse). This Board is comprised of ethicists, historians, librarians,
attorneys, physicians, and members of the Jewish community.
In addition, we ask researchers agree to conduct their work under the following set of principles:
I affirm the role of JCRS patients and staff as data creators and will avoid exploiting and/or
dehumanizing them by treating them simply as data.
My research will, when possible and appropriate, account for the contexts surrounding the JCRS
subjects as data arise. My work will recognize that all data and datasets are shaped by decisions
about how histories are recorded, remembered, and valued.
If the nature of my work is such that I am sharing the life stories and/or narratives of individuals in
these data, and I can do so with no potential harm to their reputation or that of their ancestors, I
will honor them by naming them. If the nature of my work is such that I am exploring large-scale
patterns in the dataset, and naming individuals serves no specific research purpose, I will
anonymize and/or redact names within the data.
If I am publishing the results of research conducted with these data, I will, if possible and
appropriate, include a note of recognition and/or gratitude in my publication. We suggest a version
of:
“This work was made possible in part by the patients, staff, nurses, physicians, and
community of the Jewish Consumptive Relief Society (JCRS). The people who lived, worked,
and died at the JCRS sought to relieve human suffering. I am grateful to them.”

?

I computed the following metrics on the public dataset:

volume:
    - {count: 36027, metric: "lines"}
    - {count: 2660, metric: "files"}
    - {count: 4254, metric: "regions"}
    - {count: 3494619, metric: "characters"}

Sure, the changes sound good. Thanks

Thank you very much @kimpham54 ! :)