pkiraly/qa-catalogue

Document marc-tags Solr keys

Opened this issue · 1 comments

I'd like to know mare details about Solr keys when using PICA. By now README.md states

marc-tags - the field names are MARC codes

For PICA I found indexes such as

  • 021Aa_ss : tag and subfield
  • 019x40a_ss: tag and subfield but @ is replaced by x40
  • 029F_full_ss: ?

Fields with occurrence such as 045Q/01 and 045D/00-29 are not indexed yet?

  • _ss is a general suffix. Apache Solr uses such suffixes for dynamic field names, this way you should not create a schema which maps your fields to Solr types. ss tells Solr that it should store as a string phrase (so you can use the whole phrase in facets, while you can search for the individual tokens)
  • x40: unfortunately Solr accepts alphanumerical characters in field names, so we should transform characters as @ and / to alphanumerical in a way which encodable and decodable in both Java and PHP. So the encoding is an 'x' plus the hexadecial code of the character value.
  • _full_ss: this is a special field applied for classifications and authority names. In MARC21 the main term of these fields are usually $a, but in lots of case if we display it in a facet list, it is misleading for saints, kings, popes etc. e.g. it displays "Charles", and not "Charles, V., German-Roman emperor". So in these fields I created a special field suffixed with full, which is an "all-in-one" Solr field, covering all important PICA subfields (excluding some subfields, which contains identifiers meaningless in this context).

I have to investigate cases such as 045Q/01.