Pinned Repositories
AI_Tutorial
Rocling2019 AI Tutorial file
Chinese-Word-Vectors
100+ Chinese Word Vectors 上百种预训练中文词向量
mhshih.github.io
nlp_chinese_corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
NLPCC-WordSeg-Weibo
Susing-Piauki
輸入全漢kah全羅,對齊後,ta̍k-ê詞標詞性
web
mhshih's Repositories
mhshih/ArticutAPI_Taigi
Taigi CWS/POS/NER natural language processing tool with Articut as kernel.
mhshih/mhshih.github.io
mhshih/nlp_chinese_corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
mhshih/NLPCC-WordSeg-Weibo
mhshih/Susing-Piauki
輸入全漢kah全羅,對齊後,ta̍k-ê詞標詞性
mhshih/AI_Tutorial
Rocling2019 AI Tutorial file
mhshih/Alpaca-CoT
We extend CoT data to Alpaca to boost its reasoning ability. We are constantly expanding our collection of instruction-tuning data, and integrating more LLMs together for easy use. (我们将CoT数据扩展到Alpaca以提高其推理能力,同时我们将不断收集更多的instruction-tuning数据集,并在我们框架下集成进更多的LLM。)
mhshih/ArticutAPI
API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 91% 以上,Recall 96% 以上的成績。
mhshih/bert
TensorFlow code and pre-trained models for BERT
mhshih/Chinese-Vicuna
Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca
mhshih/chineseQIE
mhshih/Disfactory
mhshih/engine
Corpus engine of PTT-Corpus
mhshih/fChartExamples2
fChart 6.0以上版本的分類範例
mhshih/G0HWcrawler
mhshih/hue7jip8
台語、族語、客語的語料清單、彙整
mhshih/interactive-tutorials
Interactive Tutorials
mhshih/ladsbook
Linguistic Analysis and Data Science
mhshih/MALINDO_Morph
Kamus morfologi untuk bahasa Melayu/Indonesia
mhshih/mhshih2.github.io
mhshih/moedict-data-twblg
臺灣閩南語常用詞辭典 資料檔
mhshih/NLP18
mhshih/NLP2022
mhshih/overleaf
A web-based collaborative LaTeX editor
mhshih/python2023
mhshih/readr-data
We will open the data for the news
mhshih/sketch_diff
mhshih/Susing-Kuhuat-Piautiau
台語詞性句法變調
mhshih/Taiwanese-Corpora.github.io
mhshih/tcsl