chinese-nlp

There are 185 repositories under chinese-nlp topic.

  • pwxcoo/chinese-xinhua

    :orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。

    Language:Python10.9k311592.5k
  • brightmart/nlp_chinese_corpus

    大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

  • LianjiaTech/BELLE

    BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)

    Language:HTML7.8k107440753
  • crownpku/Awesome-Chinese-NLP

    A curated list of resources for Chinese NLP 中文自然语言处理相关资料

  • HIT-SCIR/ltp

    Language Technology Platform

    Language:Python4.9k2515921k
  • lyogavin/airllm

    AirLLM 70B inference with single 4GB GPU

    Language:Jupyter Notebook4.1k110161340
  • Fengshenbang-LM

    IDEA-CCNL/Fengshenbang-LM

    Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。

    Language:Python4k57294374
  • baidu/lac

    百度NLP:分词,词性标注,命名实体识别,词重要性

    Language:C++3.9k106248597
  • esbatmop/MNBVC

    MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

  • fastnlp/fastNLP

    fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

    Language:Python3.1k82217450
  • CVI-SZU/Linly

    Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集

    Language:Python3k51134233
  • crownpku/Information-Extraction-Chinese

    Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取

    Language:Python2.2k91139814
  • thunlp/THULAC-Python

    An Efficient Lexical Analyzer for Chinese

    Language:Python2k80113337
  • didi/ChineseNLP

    Datasets, SOTA results of every fields of Chinese NLP

    Language:HTML1.8k6011273
  • HIT-SCIR/pyltp

    pyltp: the python extension for LTP

    Language:C++1.5k70229352
  • baidu/DDParser

    百度开源的依存句法分析系统

    Language:Python9732473163
  • lionsoul2014/jcseg

    Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch

    Language:Java9129257211
  • Doragd/Chinese-Chatbot-PyTorch-Implementation

    :four_leaf_clover: Another Chinese chatbot implemented in PyTorch, which is the sub-module of intelligent work order processing robot. 👩‍🔧

    Language:Python8771313192
  • OYE93/Chinese-NLP-Corpus

    Collections of Chinese NLP corpus

    Language:Python866152209
  • thunlp/THULAC

    An Efficient Lexical Analyzer for Chinese

    Language:C++7894245171
  • amutu/zhparser

    zhparser is a PostgreSQL extension for full-text search of Chinese language

    Language:C699218084
  • ECNU-ICALK/EduChat

    An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型,GPU部署,数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM

    Language:Python668162273
  • howl-anderson/Chinese_models_for_SpaCy

    SpaCy 中文模型 | Models for SpaCy that support Chinese

    Language:Jupyter Notebook6453237110
  • nonamestreet/weixin_public_corpus

    微信公众号语料库

  • ydli-ai/CSL

    [COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集

    Language:Python554151757
  • rime/rime-cantonese

    Rime Cantonese input schema | 粵語拼音輸入方案

    Language:Python537308262
  • crownpku/Small-Chinese-Corpus

    Some useful Chinese corpus datasets 中文语料小数据

  • modelscope/AdaSeq

    AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models

    Language:Python406123936
  • guhhhhaa/4675-scifi

    chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料

  • Walleclipse/ChineseAddress_OCR

    Photographing Chinese-Address OCR implemented using CTPN+CTC+Address Correction. 拍照文档中文地址文字识别。

    Language:Python3481623132
  • thunlp/THULAC-Java

    An Efficient Lexical Analyzer for Chinese

    Language:Java3263015112
  • jayeew/Chinese-ChatBot

    中文聊天机器人,基于10万组对白训练而成,采用注意力机制,对一般问题都会生成一个有意义的答复。已上传模型,可直接运行。

    Language:Jupyter Notebook32472270
  • boat-group/fancy-nlp

    NLP for human. A fast and easy-to-use natural language processing (NLP) toolkit, satisfying your imagination about NLP.

    Language:Python28118642
  • linonetwo/segmentit

    任何 JS 环境可用的中文分词包,fork from leizongmin/node-segment

    Language:JavaScript26541415
  • g2pC

    Kyubyong/g2pC

    g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese

    Language:Python2379930
  • howl-anderson/WeatherBot

    一个基于 Rasa 的中文天气情况问询机器人(chatbot), 带 Web UI 界面