/Nihotip

Nihotip is a web app that lets users explore Japanese text through interactive tokenization and detailed insights. Built with React and Python, it offers a dynamic way to analyze words and symbols with tooltips for deeper understanding.

Primary LanguageJavaScript

Nihotip

Nihotip is a web application designed to help users explore the intricacies of the Japanese language through a dynamic and interactive interface. With a React frontend and a Python backend, Nihotip provides a convenient way to tokenize Japanese text and delve into detailed information about words, symbols, and their respective properties via tooltips. Nihotip offers a robust solution for analyzing Japanese text at multiple levels of granularity.

Demo

URL of the published version: https://nihotip.netlify.app

✨ Features

  • Japanese Text Tokenization: Input Japanese text and have it automatically tokenized into words and symbols.

  • Detailed Word and Symbol Insights: Hover over words or symbols to access detailed tooltips that explain the structure, readings, and associated properties of each token.

  • Level-based Token Breakdown: Nihotip organizes tokenized text into multiple hierarchical levels for easy navigation (features of different levels of tokens are listed inside brackets):

    • text
      • not a japanese word
        • punctuation
        • space
        • line break
        • string of not japanese characters
      • japanese word (part of speech)
        • part by reading
          • one or multiple kanji (kana reading -> part by reading)
          • digraph
            • big kana without tenten
            • big kana with tenten
            • small kana (respective big kana)
          • kana without tenten (romaji, association)
          • kana with tenten (respective kana without tenten)
  • part by reading:

    Parts are gotten by cutting the reading of the word. They allow to determine the kana reading for each kanji. A part consists of multiple characters if the reading of a kanji along with the characters surrounding it can't be cut. For example, the part "大人" of the word "大人買い" uses a special reading "おとな" that can't be cut. That's why the "おとな" reading applies to the whole part.

  • syllable:

    • single kana
    • digraph
    • kana with "っ", "ッ" or "ー"
    • single kanji
  • Tooltip insights: Show how readings map to individual characters and provide additional details like romaji and kana associations.

🛠️ Getting Started

To run the application locally, follow these steps:

To run the application locally, follow these steps:

  1. Clone the repository and navigate into the project directory.

  2. Set up Environmental Variables:

    Create .env files in the respective directories with the following content:

    • client/.env

      Create a file named client/.env and add:

      REACT_APP_BACKEND_URL=http://localhost:3001
      
    • server/.env

      Create a file named server/.env and add:

      PORT=3001
      HOST=localhost
      FRONTEND_URL=http://localhost:3000
      
  3. Open two terminal windows and run the following commands in separate terminals:

    # Start the frontend (React)
    cd client
    npm install
    npm start
    # Start the backend (Python)
    cd server
    pip install -r requirements.txt
    python main.py
  4. Open your browser and visit http://localhost:3000 to start interacting with Nihotip.

🚀 Upcoming Features

  • Multilingual Tooltips: Add the option to choose the language for tooltips to enhance accessibility for non-Japanese speakers.

  • Word Normalization: Implement word normalization for more accurate tokenization results.

  • Notes for Ambiguous Words: Provide detailed notes for words that belong to multiple parts of speech or have different interpretations based on context.

🤝 Contributing

We welcome contributions! If you'd like to contribute to Nihotip, feel free to submit issues or pull requests on the GitHub repository.