clarin-eric/ParlaMint

DK, IS: enabling "topic" in concordancer

Closed this issue · 7 comments

IS (currently) and DK (planned for 4.1) have, on the level of speeches, also the topic of discussion, and it would be nice to enable searching on this attribute for the two corpora in the concordancers. This issues discusses how to implement this extension for the two corpora.
This might be interesting for @starkadur and @BartJongejan, and maybe for @matyaskopp.

Both corpora give the topic as an (additional) pointer in the u/@ana attribute, however, they then implement this differently:

  • DK has the pointer to the topic into a new taxonomy ParlaMint-DK-taxonomy-domains.xml, and the topics are called "domains"
  • IS has the pointer to their ParlaMint-IS-taxonomy-parla.topics.xml taxonomy, which gives the debate topics, (almost 3,000 of them, and only in Icelandic), and from that taxonomy pointers on category/@ana to their ParlaMint-IS-taxonomy-parla.categories.xml taxonomy, which gives the actual topics (here called "categories")

Note also that I don't propose (unless there are shouts to the contrary!) do encode this additional info into the metadata TSVs for speeches, as currenlty these TSVs are the same for all corpora, and this addition would brek this (alternativelly we would have for 27 corpora a column which would always be empty).

DK has "multivalues" separated with | in NoSketch, but the multivalues are not sorted. see:
image

Is it possible to visualize the proper multivalues in NoSketch search (One category per row in statistics)?

DK has "multivalues" separated with | in NoSketch, but the multivalues are not sorted.

Well spotted! However, I'm not sure this is a bug, i.e. if DK on purpose puts the major topic first, then they should probably not be sorted. Maybe @BartJongejan knows the answer?

Also, note that these metadata values are primarily used for filterning search, where the sort order doesn't matter.

Is it possible to visualize the proper multivalues in NoSketch search (One category per row in statistics)?

I don't think so...

I've now impleneted sorting of topics, maybe it is better we have it.
If so inclined, you can check again

If no further comments, I think this issues is ready for closing.

I think it is ok now.
I have been thinking about multivalue, and it does not make sense to count some speeches multiple times, because they have multiple categories - so pie-charts have simple interpretations this way.

I believe you can close the issue

I have been thinking about multivalue, and it does not make sense to count some speeches multiple times, because they have multiple categories - so pie-charts have simple interpretations this way.

That is very true, even obvious, in retrospect.