computational-linguistics

There are 438 repositories under computational-linguistics topic.

boudinfl/pke
Python Keyphrase Extraction module
Language:Python1.6k 31 147291
arguman/arguman.org
Argument mapping and analysis platform
Language:Python1.4k 64 258149
arbox/nlp-with-ruby
Curated List: Practical Natural Language Processing done in Ruby
Language:Ruby1k 58 1170
PyThaiNLP/pythainlp
Thai natural language processing in Python
Language:Python996 46 372273
eselkin/awesome-computational-neuroscience
A list of schools and researchers in computational neuroscience
767 33 978
proycon/pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Language:Python480 31 2568
acl-org/acl-anthology
Data and software for building the ACL Anthology.
Language:Python463 20 2.6k306
IlyaGusev/rulm
Language modeling and instruction tuning for Russian
Language:Jupyter Notebook457 17 2250
yogurt-cultures/kefir
🥛turkic morphology project
Language:Python457 24 429
adbar/German-NLP
Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
456 45 567
roomylee/nlp-papers-with-arxiv
Statistics and accepted paper list of NLP conferences with arXiv link
Language:Jupyter Notebook430 10 055
jacksonllee/pycantonese
Cantonese Linguistics and NLP
Language:Python364 20 4438
dkulagin/kartaslov
Открытые лингвистические датасеты: тональный словарь русского языка КартаСловСент, датасет по семантике, ассоциативный граф и датасет по орфографическим ошибкам и опечаткам.
362 33 150
CUNY-CL/wikipron
Massively multilingual pronunciation mining
Language:Python326 18 15871
oroszgy/awesome-hungarian-nlp
A curated list of NLP resources for Hungarian
229 21 1418
BLLIP/bllip-parser
BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.
Language:GAP227 17 6153
UFAL-DSG/tgen
Statistical NLG for spoken dialogue systems
Language:Python206 15 3462
cbaziotis/datastories-semeval2017-task4
Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".
Language:Python198 15 1563
mannefedov/compling_nlp_hse_course
Материалы курса по компьютерной лингвистике Школы Лингвистики НИУ ВШЭ
Language:Jupyter Notebook182 8 176
CoEDL/elpis
🙊 software for creating speech recognition models.
Language:Python154 16 17533
own-pt/openWordnet-PT
OpenWordnet-PT: an open access wordnet for Portuguese
Language:Shell154 17 18936
dcavar/python-tutorial-notebooks
Python tutorials as Jupyter Notebooks for NLP, ML, AI
Language:Jupyter Notebook130 12 085
proycon/colibri-core
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Language:C++124 12 3720
TiesdeKok/Python_NLP_Tutorial
This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)
Language:Jupyter Notebook122 8 166
jonathandunn/text_analytics
Basic text analytics and natural language processing in Python
Language:Python120 14 953
nschneid/amr-tutorial
Abstract Meaning Representation (AMR) tutorial slides
Language:TeX116 9 114
simongray/datalinguist
Stanford CoreNLP in idiomatic Clojure.
Language:Clojure115 8 105
proycon/flat
FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
Language:JavaScript111 11 18615
SergeyShk/ruTS
Библиотека для извлечения статистик из текстов на русском языке.
Language:Python104 3 417
DmitryRyumin/EMNLP-2023-Papers
EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. :star: support NLP!
Language:Python103 3 07
JonathanReeve/course-computational-literary-analysis
Course materials for Introduction to Computational Literary Analysis, taught at UC Berkeley in Summer 2018, 2019, and 2020, at Columbia University in Fall 2020, and again at UC Berkeley in Summer 2021 and 2022.
Language:Jupyter Notebook91 13 394
sismetanin/word2vec-tsne
Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.
Language:Jupyter Notebook76 1 131
LanguageMachines/frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
Language:C++75 17 10311
proycon/LaMachine
LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilation/installation script
Language:Shell68 16 21320
LanguageMachines/ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --
Language:C++66 13 9313
proycon/folia
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
Language:Python60 13 9710

computational-linguistics

boudinfl/pke

arguman/arguman.org

arbox/nlp-with-ruby

PyThaiNLP/pythainlp

eselkin/awesome-computational-neuroscience

proycon/pynlpl

acl-org/acl-anthology

IlyaGusev/rulm

yogurt-cultures/kefir

adbar/German-NLP

roomylee/nlp-papers-with-arxiv

jacksonllee/pycantonese

dkulagin/kartaslov

CUNY-CL/wikipron

oroszgy/awesome-hungarian-nlp

BLLIP/bllip-parser

UFAL-DSG/tgen

cbaziotis/datastories-semeval2017-task4

mannefedov/compling_nlp_hse_course

CoEDL/elpis

own-pt/openWordnet-PT

dcavar/python-tutorial-notebooks

proycon/colibri-core

TiesdeKok/Python_NLP_Tutorial

jonathandunn/text_analytics

nschneid/amr-tutorial

simongray/datalinguist

proycon/flat

SergeyShk/ruTS

DmitryRyumin/EMNLP-2023-Papers

JonathanReeve/course-computational-literary-analysis

sismetanin/word2vec-tsne

LanguageMachines/frog

proycon/LaMachine

LanguageMachines/ucto

proycon/folia