computational-linguistics

There are 418 repositories under computational-linguistics topic.

  • boudinfl/pke

    Python Keyphrase Extraction module

    Language:Python1.5k31146291
  • arguman/arguman.org

    Argument mapping and analysis platform

    Language:Python1.4k64258156
  • arbox/nlp-with-ruby

    Curated List: Practical Natural Language Processing done in Ruby

    Language:Ruby1k59970
  • eselkin/awesome-computational-neuroscience

    A list of schools and researchers in computational neuroscience

  • proycon/pynlpl

    PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

    Language:Python477322567
  • yogurt-cultures/kefir

    🥛turkic morphology project

    Language:Python45423429
  • roomylee/nlp-papers-with-arxiv

    Statistics and accepted paper list of NLP conferences with arXiv link

    Language:Jupyter Notebook42910055
  • IlyaGusev/rulm

    Language modeling and instruction tuning for Russian

    Language:Jupyter Notebook422162051
  • adbar/German-NLP

    Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

  • acl-org/acl-anthology

    Data and software for building the ACL Anthology.

    Language:Python363191.9k245
  • dkulagin/kartaslov

    Открытые лингвистические датасеты: тональный словарь русского языка КартаСловСент, датасет по семантике, ассоциативный граф и датасет по орфографическим ошибкам и опечаткам.

  • jacksonllee/pycantonese

    Cantonese Linguistics and NLP

    Language:Python337214138
  • CUNY-CL/wikipron

    Massively multilingual pronunciation mining

    Language:Python2951715767
  • BLLIP/bllip-parser

    BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.

    Language:GAP227176153
  • oroszgy/awesome-hungarian-nlp

    A curated list of NLP resources for Hungarian

  • UFAL-DSG/tgen

    Statistical NLG for spoken dialogue systems

    Language:Python204153462
  • cbaziotis/datastories-semeval2017-task4

    Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".

    Language:Python196161563
  • mannefedov/compling_nlp_hse_course

    Материалы курса по компьютерной лингвистике Школы Лингвистики НИУ ВШЭ

    Language:Jupyter Notebook1707175
  • elpis

    CoEDL/elpis

    🙊 software for creating speech recognition models.

    Language:Python1521517533
  • own-pt/openWordnet-PT

    OpenWordnet-PT: an open access wordnet for Portuguese

    Language:Shell1521718935
  • proycon/colibri-core

    Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

    Language:C++122113720
  • jonathandunn/text_analytics

    Basic text analytics and natural language processing in Python

    Language:Python11814953
  • TiesdeKok/Python_NLP_Tutorial

    This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)

    Language:Jupyter Notebook1168163
  • nschneid/amr-tutorial

    Abstract Meaning Representation (AMR) tutorial slides

    Language:TeX1159114
  • simongray/datalinguist

    Stanford CoreNLP in idiomatic Clojure.

    Language:Clojure1138105
  • dcavar/python-tutorial-notebooks

    Python tutorials as Jupyter Notebooks for NLP, ML, AI

    Language:Jupyter Notebook11012081
  • proycon/flat

    FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.

    Language:JavaScript1081118315
  • ruTS

    SergeyShk/ruTS

    Библиотека для извлечения статистик из текстов на русском языке.

    Language:Python1033417
  • JonathanReeve/course-computational-literary-analysis

    Course materials for Introduction to Computational Literary Analysis, taught at UC Berkeley in Summer 2018, 2019, and 2020, at Columbia University in Fall 2020, and again at UC Berkeley in Summer 2021 and 2022.

    Language:Jupyter Notebook8613393
  • DmitryRyumin/EMNLP-2023-Papers

    EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. :star: support NLP!

    Language:Python81203
  • LanguageMachines/frog

    Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

    Language:C++731610211
  • sismetanin/word2vec-tsne

    Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.

    Language:Jupyter Notebook731131
  • proycon/LaMachine

    LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilation/installation script

    Language:Shell671621320
  • LanguageMachines/ucto

    Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --

    Language:C++63139113
  • proycon/folia

    FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions

    Language:Python59139710
  • sismetanin/sentiment-analysis-of-tweets-in-russian

    Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.

    Language:Jupyter Notebook563232