ewg118/numishare

Beginner questions

Closed this issue ยท 20 comments

I set up Numishare in a VM and tried it out. I have some basic questions that I couldn't figure out from the docs or the source code. My apologies if there is documentation somewhere and I missed it.

When creating a collection there is a Maintainer field, it is mandatory, no explanation.
I suspect this is a http://nomisma.org/nuds#agent , "The agent responsible for maintaining the data." Is that supposed to be a human name like "John Smith", an email address, or a URI?

I notice the Data License doesn't let me specify a closed data license. Can Numishare be used for non-open data licenses?

Once I create a collection I can't edit it ... (I wanted to change the "Public Site" URL.)

The icon for http://localhost:9080/orbeon/numishare/collection1/ is broken. (This may have to do with Themes, I didn't read that part of the install instructions.)

What is the format of the Google Spreadsheet import?

I am trying to enter my first coin, a coin similar to http://numismatics.org/collection/1941.153.286?lang=en
That coin has a date range of Date Range: 525 BC - 450 BC.
Numishare wants both machine readable and human readable date range.
I guessed the machine readable range should be -525 and -450, but that was not acceptable. What should it look like? Is the start a http://nomisma.org/nuds#fromDate ?

Obverse and Reverse wanted me to select a language. Is that the language of the description I typed in?

Tried to Save a first coin. Got a "Save Error. Are all required inputs filled?"
I don't see any exclaimation points on the form for incomplete data. I eventually gave up and started over entering the coin and it worked. A hint as to the missing input would be of value.

Does "logout" log out? I did that and still see a logout button, and can still create coins.

Regarding the date format.
What I do in such cases: I check the xml format of a OCRE or ANS item. For your example it looks like:

http://numismatics.org/collection/1941.153.286.xml

In the output I do see:

<fromDate standardDate="-0525">525 BC</fromDate>
<toDate standardDate="-0450">450 BC</toDate>

So, try -0525 and -0450.

Regarding the missing icons. Try for your environment:
Admin UI > edit collection> Modify Settings > Theme and Layout > Theme URL: "http://localhost:9080/orbeon/themes/" > Save

Thanks for your help, @Msch0150 . At this point I can add a coin manually, and back up the eXist-DB data. I still have a few basic questions.

  • I don't want to interactively enter coins. I see that Numishare lets me export an individual coin as NUDS/XML and several other formats. I can import a collection from a "Google Spreadsheet". Is there an example of such a spreadsheet? I'd love to kick the tires with some real data. What is the recommended way to bring many coins online?
  • I see several kinds of search results (HTML, RSS, Atom) at http://localhost:9080/orbeon/numishare/collection1/apis#apis . There is a "Solr/XML" row but there is no REST endpoint for it. Is this the SPARQL endpoint? One of the reasons I did this exercise is because I read about the SPARQL endpoint seven years ago and have been wanting to try it. Does the current version Numishare offer a SPARQL endpoint and how do I query it?

So far I never used a "Google Spreadsheet". Probably it is easier than the way I am currrently feeding my numishare.
I have an Access DB with coins and created a powershell script which generates a xml-file (NUDS-format) for each coin. In other cases I am maintaining Excel-Sheets and use another Powershell script which generates the xml-files (NUDS-format).
I upload the xml-files via curl into the Exist-DB, example:

    $targetUrl = "http://localhost:10204/exist/rest/mycollection/objects/"
    $existDbUsername = "admin:"
$argumentList = "-v -u $existDbUsername  $targetUrl --upload-file $fullPath"
    start-Process -Wait "C:\curl\bin\curl.exe" -ArgumentList $argumentList -NoNewWindow -RedirectStandardOutput ".\NUL"

But I think a spreadsheet might be easier.
I found this page: https://numishare.blogspot.com/2019/08/recommendations-for-numismatic.html but the first link does not point to a document. I assume that @ewg118 will write some info on this.

The Google Spreadsheet mechanism won't function at present because Google deprecated the Sheets v3 API in September or October, and I haven't rewritten the import for the new API yet. Most projects that use Numishare start as spreadsheets which are transformed into NUDS by a script or using OpenRefine export templates.

@ewg118 : thanks for the info.
The NUDS format is well formatted, but it is not easy to handle it. It is always necessary to write to program to transform it into the format. It might be useful to have ONE standard simplyfied CSV (or TSV, ...) format as an exchange format. This could be filled even without IT knowledge. If this simplyfied format is defined, then transformation programs can be develped and reused by others.
I assume that something similar is already defined by the "Google Spreadsheet" import. Wouln't it be an idea to make it official? Or to develop an official one which is accepted by the major institutions (like ANS)?

It makes sense to offer a tool to convert a pre-NUDS .csv file to NUDS. I will take a stab at that, time-permitting.

The spreadsheet import mechanism would enable you to map column headings to NUDS elements and perform necessary validation of data (including the looking up of Nomisma URIs) in order to construct NUDS XML records and import them into the eXist XML database and index them into Solr. There is some documentation for best practices of spreadsheet structure here: https://numishare.blogspot.com/2019/08/recommendations-for-numismatic.html

It has been tested with creating typologies and supported some physical descriptions, though it isn't complete with respect to some other aspects of the schema for physical coins (provenance and others). I need to take some time to fix the code for Google Sheets v4 API, but creating an intermediary CSV -> NUDS script would be much easier than manually entering everything in the back-end.

@ewg118 Usually my first step when generating XML is obtaining language bindings for the schema.

I tried to generate bindings with xsdgen (Golang) and xjc (Java). Neither one was happy with http://nomisma.org/nuds.xsd .

http://nomisma.org/nuds.xsd refers to
schemaLocation="http://www.stoa.org/epidoc/schema/dev/tei-epidoc.xsd"
but there is no file there. Wikipedia suggests https://epidoc.stoa.org/schema/latest/tei-epidoc.rng which has the wrong MIME type, but looks valid. Neither generator worked for me. (https://en.wikipedia.org/wiki/EpiDoc#cite_note-4 )

I could try more generators but unless you have had a good experience with generation I might pivot to ad-hoc XML output logic.

@ewg118 I attempted to create a simple proof-of-concept to translate an ad-hoc CSV into NUDS xml. Unfortunately, my generated XML not work! Do I need to wrap the <NUDS> in something to get eXist-db to accept it?

58627.xml.txt

This yields "400 Unknown XML root element: nuds"

EXIST_HOST=localhost:8888
COLLECTION=collection1
EXIST_USER=admin
EXIST_PASSWORD=
curl -v -X POST --user "$EXIST_USER":"$EXIST_PASSWORD" http://"$EXIST_HOST"/exist/rest/"$COLLECTION"/objects/ --upload-file 58627.xml.txt`

To troubleshoot, I tried to send a Numishare-generated NUDS. Same error:

curl http://numismatics.org/collection/1922.999.73.xml > data/1922.999.73.xml
curl -v -X POST --user  "$EXIST_USER":"$EXIST_PASSWORD" http://"$EXIST_HOST"/exist/rest/"$COLLECTION"/objects/ --upload-file data/1922.999.73.xml

Try with the /exist/rest/db/$COLLECTION path. The 'db' is missing.

EpiDoc deprecated their XSD schema and deleted their files on the server, which means I have to rewrite NUDS as RNG in order to point to the EpiDoc RNG.

I am getting the same 400 Unknown XML root element: nuds error with curl -v -X POST --user "$EXIST_USER":"$EXIST_PASSWORD" http://"$EXIST_HOST"/exist/rest/db/"$COLLECTION" --upload-file data/1922.999.73.xml.

I also failed to upload interactively. According to http://localhost:8888/exist/apps/doc/uploading-files "eXist-db's Dashboard's comes with a Collections pane." (I don't have a Collections pane.) That page also suggests using eXist-db's built-in Integrated Development Environment (IDE). File, Manage from the menu; click on the Upload button. This sequence of steps produces no error but also nothing new in Numishare. http://localhost:8983/solr/#/numishare/core-overview also shows I only have the single Numishare document (coin) that I manually entered.

Could upload the file into a collection using:

curl.exe -v -u admin: http://localhost:10204/exist/rest/cf/objects/ --upload-file c:\temp\58627.xml

My collection "cf" is a collection which was created using the numishare UI (+Add collection > ...)
After that the collection "cf" appears in the UI of the Exist DB (for me: http://localhost:10204/ > eXide XQuery IDE > tab "Directory" > expand "db" > expand "cf" > expand "objects"
In the objects directory I do see the 58627 after the upload.

Info: the 58627.xml is probably missing some tags, because I cannot "publish" it in the numishare UI.

I think I just found the root cause:

The "/" after objects is probably required at the end of "objects". At least, I do get the error 400 if I omit the tailing "/".
Try something like:

curl -v -u "$EXIST_USER":"$EXIST_PASSWORD" http://"$EXIST_HOST"/exist/rest/db/"$COLLECTION"/objects/ --upload-file data/1922.999.73.xml.

Info: I am using exist DB 6.0.1.

@Msch0150 Thanks, that works great! After uploading it, it appeared in my "List of Objects". I was able to publish it, and it now appears properly during a browse.

My early attempts used -X POST but and this was wrong; Numishare needs -X PUT (or nothing; as the --upload-file implies it).

My own generated NUDS is accepted, and appears in my "List of Objects". However, the "publish" option is not present, presumably because my generated markup is invalid. I can Edit the coin.

Regarding the "Publish":
Probably something is missing like:

<control>
  <recordId>58627</recordId>
  <publicationStatus>approved</publicationStatus>
  <maintenanceStatus>revised</maintenanceStatus>
</control>

Thanks, @Msch0150 ! Adding <publicationStatus> and <maintenanceStatus> allowed me to publish.

I wrote a CSV to NUDS converter at https://github.com/esnible/csv-nuds . It only works for single-row CSVs. I'd like to be able to support CSVs of unlimited size. Is it possible to send more than one <nuds> to Numishare in a single PUT? If not I'll need to refactor so that I contact Numishare once per row.

No, just PUT each file individually. eXist is very fast at ingesting and indexing the files. Alternatively, you could use the eXist Java client and upload an entire directory of files.

Thanks @ewg118 and @Msch0150 for all of the help. I have created a Vagrantfile to automate installing Numishare for Vagrant users. https://github.com/esnible/numishare-vagrant