problems with tagging <m> within strings

Question

problems with tagging <m> within strings

Opened this issue 5 years ago · 11 comments

In issue #88 we concluded that rather than keep the <c>'s from the transcriptions in order to make the content more searchable and usable, we would remove all <c>'s except for where on a morpho-semantically significant tone and these would be changed to <m>, thus leaving the structure as follows:

           <u who="#TS" xml:id="d1e112" n="2" start="1.48" end="2.98" xml:lang="mix">
               <seg xml:lang="mix" xml:id="d1e113" notation="orth" type="S">
                  <w xml:id="d1e114" synch="#T14">sketa</w>
                  <w xml:id="d1e116" synch="#T19">ntikii</w>
               </seg>
               <seg xml:lang="mix" xml:id="d1e118" notation="ipa" type="S" sameAs="#d1e113">
                  <w xml:id="d1e119" synch="#T14" sameAs="#d1e114">skɛ<m xml:id="d1e225">˥</m>t̪a<m xml:id="d1e120">↘</m></w>
                  <w xml:id="d1e132" synch="#T19" sameAs="#d1e116">nd̪i↘kiː↘↗ꜛ</w>
               </seg>
            </u>

However while an improvement, this is still problematic in that if one is searching for phonological content, where there is a <m> (which also means that the tone encoded therein is particularly significant) it is not possible to search for full phonetic strings.

So there are three possible solutions I can envision:

Live with it
Copy the string into an attribute like @orig and search for phonetics in the attribute values (though that contradicts the usage in this project in which I'm using these to keep track of where I've normalized)
Make another copy of the IPA contents and don't include the <m>'s;
However, this raises the questions of:
- these would have to be linked to either the orthographic or the original IPA contents
  which would be best to point to? Could we instead also have the orth <seg> point to it?
- they would have to be typed; which is a problem given that @type is already used to classify the type of segment (thus @subtype wouldn't be consistant) and @Notation is still ="ipa"

Below is an example in which I use @function="full" on the <seg> and which also points to the orthographic <seg>:

           <u who="#TS" xml:id="d1e112" n="2" start="1.48" end="2.98" xml:lang="mix">
              <seg xml:lang="mix" xml:id="d1e113" notation="orth" type="S">
                 <w xml:id="d1e114" synch="#T14">sketa</w>
                 <w xml:id="d1e116" synch="#T19">ntikii</w>
              </seg>
              <seg xml:lang="mix" xml:id="d1e118" notation="ipa" type="S" sameAs="#d1e113">
                 <w xml:id="d1e119" synch="#T14" sameAs="#d1e114">skɛ<m xml:id="d1e225">˥</m>t̪a<m xml:id="d1e120">↘</m></w>
                 <w xml:id="d1e132" synch="#T19" sameAs="#d1e116">nd̪i↘kiː↘↗ꜛ</w>
              </seg>
              <seg xml:lang="mix" xml:id="d1e128" notation="ipa" type="S" sameAs="#d1e113" function="full">
                 <w xml:id="d1e129" synch="#T14" sameAs="#d1e114">skɛ˥t̪a↘</w>
                 <w xml:id="d1e142" synch="#T19" sameAs="#d1e116">nd̪i↘kiː↘↗ꜛ</w>
              </seg>            
           </u>

Using this, a search for all phonetic strings would then have to be done matching both @Notation="ipa" and @function="full"; and to get the full phonetic string (to copy into a dictionary for example) it would have to match the same as well as point to an @xml:id of a <w> which is a child of <seg notation="orth">.

What do you think @laurent?

Answer 1 · 2019-10-30T09:31:29.000Z

Now that I think about it, hadn't we manage to implement an XSLT search that flattens strings?

Answer 2 · 2019-10-30T09:54:23.000Z

I already have done it myself! But the problem isn't how to do it it, it's how to encode and annotate it in a way that allows for easy access but also maximally accurate annotation

Answer 3 · 2019-10-30T10:44:16.000Z

actually I remember what you were talking about it was something to retrieve the content, but it was based on searching for the translations. The goal, and the basis of this issue is to try to figure out a way to be able to search the Mixtec, specifically the phonetic and/or orthographic strings.

Answer 4 · 2019-10-30T10:56:13.000Z

That's what I mean, if we can manage to search in decent conditions, I would not delete fine grained markup too much...

Answer 5 · 2019-10-30T11:05:18.000Z

That should be feasible to adapt the search function to flatten the content. I can see several techniques. Can you show me how you do it currently?

…

Le 30 oct. 2019 à 11:44, Jack Bowers ***@***.***> a écrit : actually I remember what you were talking about it was something to retrieve the content, but it was based on searching for the translations, the goal is to be able to search the Mixtec, specifically the phonetic and/or orthographic strings. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#90?email_source=notifications&email_token=ABH5B32XRZSZMZENQAEHHETQRFQQDA5CNFSM4JGMK4GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECTWCKA#issuecomment-547840296>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABH5B3ZAR2NJMBGLG22FJK3QRFQQDANCNFSM4JGMK4GA>.

Answer 6 · 2019-10-30T11:12:24.000Z

Sorry, I misunderstood your first comment originally, what I said I did was just to make a flat copy to convert the phonetics with the <c>'s for every character.

So the only think I do to search the strings is just basic XQuery (I generally use XQuery to search and only use XSLT to convert into another format) I search as follows: e.g. //seg[@notation='ipa']/w[contains(.,'skɛ˥t̪a↘')] (which isn't possible unless I make that flattened copy)

Answer 7 · 2019-10-30T11:21:41.000Z

So there is a possibility by replacing the “.” by a function that flattens the content of <w>. This is where I see a technical solution. Do you know how to write a function? This would call <xsl:value-of select=“xxx” separator=“”/> (the empty string is significant since by default, it is a white space.

…

Le 30 oct. 2019 à 12:12, Jack Bowers ***@***.***> a écrit : Sorry, I misunderstood your first comment originally, what I said I did was just to make a flat copy to convert the phonetics with the <c>'s for every character. So the only think I do to search the strings is just basic XQuery (I generally use XQuery to search and only use XSLT to convert into another format) I search as follows: e.g. ***@***.***='ipa']/w[contains(.,'skɛ˥t̪a↘')] (which isn't possible unless I make that flattened copy) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#90?email_source=notifications&email_token=ABH5B3YKVQGUHWHVBJTLFFDQRFTZRA5CNFSM4JGMK4GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECTYVTA#issuecomment-547850956>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABH5B347RXQOBL62J4BH3ADQRFTZRANCNFSM4JGMK4GA>.

Answer 8 · 2019-10-30T11:38:01.000Z

I wouldn't know how to do that. I assume this is with XSLT not XQuery? I like making things XQuery friendly because in Oxygen, you can do 'search whole project' and it gathers from files in different folders but in XSLT you have to specify a single directory (unless I'm mistaken)..

Answer 9 · 2019-10-30T13:37:18.000Z

I'm thinking it may also be possible to search using "string-join" in XQuery but I'm not sure yet...

Answer 10 · 2019-10-30T13:53:15.000Z

That would be XPath, which is both XQuery and XSLT friendly.

…

Le 30 oct. 2019 à 12:38, Jack Bowers ***@***.***> a écrit : I wouldn't know how to do that. I assume this is with XSLT not XQuery? I like making things XQuery friendly because in Oxygen, you can do 'search whole project' and it gathers from files in different folders but in XSLT you have to specify a single directory (unless I'm mistaken).. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#90?email_source=notifications&email_token=ABH5B374PR4XSVKRCQAUXKDQRFWZTA5CNFSM4JGMK4GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECT24YQ#issuecomment-547860066>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABH5B3YDOQRYWEAYCS6VQXLQRFWZTANCNFSM4JGMK4GA>.

Answer 11 · 2019-10-30T13:53:45.000Z

I am not mastering XQuery, but I could check easily.

…

Le 30 oct. 2019 à 14:37, Jack Bowers ***@***.***> a écrit : I'm thinking it may also be possible to search using "string-join" in XQuery but I'm not sure yet... — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#90?email_source=notifications&email_token=ABH5B32I3JAOZBEHOSTQ76TQRGEY7A5CNFSM4JGMK4GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECUGIMQ#issuecomment-547906610>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABH5B3452QQQGRDQYXJTIITQRGEY7ANCNFSM4JGMK4GA>.