/NLPH

The Vision and goals of the Open Natural Language Processing in Hebrew Project

MIT LicenseMIT

The Open Natural Language Processing in Hebrew Project

Vision

Our vision is to bring Natural Language Processing capabilities in Hebrew to a level on par with international industry standards, keeping up with state-of-the-art techniques by providing open source implementations to new algorithms and tools, and making these capabilities publicly available for both public and commercial use.

Goals

  1. Creating, maintaining, adapting and spreading resources that enable high-quality, production-ready, open-licensed Natural Language Processing in Hebrew.
  2. Enable, foster and catalyze cooperation between stakeholders in academia, private and the public sectors, in order to promote better Open Source Hebrew NLP solutions, and share existing knowledge and tools.

Who's taking part?

What's our current focus?

  • Forming a group of volunteers to start work on the core components, during developer meetings of the Public Knowledge Workshop and in other frameworks - including events like hackathons and as part of educational and research projects.

  • Adapting and integrating existing Hebrew NLP Python tools with existing popular frameworks:

  • Creating those tools when they are missing, focusing on:

    • Tokenization. Specifically stemming and lemmatization.
    • A word embeddings model for Hebrew
    • Part-of-speech tagger
  • Encouraging the open-licensing of high quality, open-licensed, tagged and labelled datasets from various domains (social media, articles, research papers, etc.) and for various tasks (part-of-speech tagging, text classification, sentiment analysis, named entity recognition, etc.), and helping in authoring these datasets where they are missing.

How can I help?

  • Join our Newsletter, for updates and for opportunities to contribure!
  • Need something more specific? Email us at NLPH.Project@gmail.com.
  • Join the discussion in our Facebook group.
  • If you are associated with an organization that already has good, working solutions for some of the problems we are interested in, and you'd like to consider sharing those solutions (or a subset thereof) in a suitable open license, we'd love to hear from you!

References