tshatrov/ichiran

Include root word information for conjugated words in JSON

Heliozoa opened this issue · 0 comments

Hi,
Thanks for your work on ichiran.

Would it be possible to include more information about the root word for conjugated words in the JSON from ichiran-cli -f? For my use case, ideally the text, kana and seq fields from the root word would be included. For example

見て => 見る, みる, 1259290
観て => 観る, みる, 1259290
みて => みる, みる, 1259290

I'm looking to generate anki cards from sentences, using ichiran to detect the individual words in the sentence. Currently it seems tricky to programmatically determine that みてみる is really the same word twice, for example:

[
  [
    [
      [
        [
          "mite",
          {
            "reading": "みて",
            "text": "みて",
            "kana": "みて",
            "score": 40,
            "seq": 10591144,
            "conj": [
              {
                "prop": [
                  {
                    "pos": "v1",
                    "type": "Conjunctive (~te)"
                  }
                ],
                "reading": "見る 【みる】",
                "gloss": [
                  {
                    "pos": "[v1,vt]",
                    "gloss": "to see; to look; to watch; to view; to observe"
                  },
                  {
                    "pos": "[v1,vt]",
                    "gloss": "to examine; to look over; to assess; to check; to judge"
                  },
                  {
                    "pos": "[v1,vt]",
                    "gloss": "to look after; to attend to; to take care of; to keep an eye on"
                  },
                  {
                    "pos": "[v1,vt]",
                    "gloss": "to experience; to meet with (misfortune, success, etc.)"
                  },
                  {
                    "pos": "[aux-v,v1]",
                    "gloss": "to try ...; to have a go at ...; to give ... a try",
                    "info": "after the -te form of a verb"
                  },
                  {
                    "pos": "[aux-v,v1]",
                    "gloss": "to see (that) ...; to find (that) ...",
                    "info": "as 〜てみると, 〜てみたら, 〜てみれば, etc."
                  }
                ],
                "readok": true
              }
            ]
          },
          []
        ],
        [
          "miru",
          {
            "reading": "みる",
            "text": "みる",
            "kana": "みる",
            "score": 40,
            "seq": 1259290,
            "gloss": [
              {
                "pos": "[v1,vt]",
                "gloss": "to see; to look; to watch; to view; to observe"
              },
              {
                "pos": "[v1,vt]",
                "gloss": "to examine; to look over; to assess; to check; to judge"
              },
              {
                "pos": "[v1,vt]",
                "gloss": "to look after; to attend to; to take care of; to keep an eye on"
              },
              {
                "pos": "[v1,vt]",
                "gloss": "to experience; to meet with (misfortune, success, etc.)"
              },
              {
                "pos": "[aux-v,v1]",
                "gloss": "to try ...; to have a go at ...; to give ... a try",
                "info": "after the -te form of a verb"
              },
              {
                "pos": "[aux-v,v1]",
                "gloss": "to see (that) ...; to find (that) ...",
                "info": "as 〜てみると, 〜てみたら, 〜てみれば, etc."
              }
            ],
            "conj": []
          },
          []
        ]
      ],
      80
    ]
  ]
]