pretrain

There are 21 repositories under pretrain topic.

brightmart/nlp_chinese_corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
9.5k 287 451.5k
keyu-tian/SparK
[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
Language:Python1.5k 27 8887
CLUEbenchmark/CLUECorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
929 20 1281
yangjianxin1/Firefly-LLaMA2-Chinese
Firefly中文LLaMA-2大模型，支持增量预训练Baichuan2、Llama2、Llama、Falcon、Qwen、Baichuan、InternLM、Bloom等大模型
Language:Python401 11 1232
microsoft/UniVL
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Language:Python342 10 4454
xcfcode/What-I-Have-Read
Paper Lists, Notes and Slides, Focus on NLP. For summarization, please refer to https://github.com/xcfcode/Summarization-Papers
162 7 015
THUNLP-AIPoet/BERT-CCPoem
BERT-CCPoem is an BERT-based pre-trained model particularly for Chinese classical poetry
Language:Python147 3 318
thunlp/RE-Context-or-Names
Bert-based models(BERT, MTB, CP) for relation extraction.
Language:Python100 10 2322
huzongxiang/MatDGL
MatDGL is a neural network package that allows researchers to train custom models for crystal modeling tasks. It aims to accelerate the research and application of material science.
Language:Python63 9 012
CoinCheung/MFM
code for paper "Masked Frequency Modeling for Self-Supervised Visual Pre-Training" (https://arxiv.org/pdf/2206.07706.pdf)
Language:Python24 3 32
SalesforceAIResearch/pretrain-time-series-cloudops
Official code repository for the paper "Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain"
Language:Python22 3 43
nancheng58/SSL4SR
[CCIR 2023] Self-supervised learning for Sequential Recommender Systems
Language:Python20 1 00
bayartsogt-ya/albert-mongolian
ALBERT trained on Mongolian text corpus
Language:Jupyter Notebook18 2 22
yongzhuo/MacroGPT-Pretrain
macrogpt大模型全量预训练(1b3,32层), 多卡deepspeed/单卡adafactor
Language:Python13 1 02
mrzjy/hoyo_public_wiki_parser
Parsing Hoyoverse game text corpus from public wikipedia
Language:Python9 2 02
pskliff/vtb-data-fusion
This repository provides code solution for Data Fusion Contest task 1
Language:Jupyter Notebook8 2 01
arrrrrmin/albert-guide
Understanding "A Lite BERT". An Transformer approach for learning self-supervised Language Models.
Language:Python7 5 01
janelu9/flash-finetuning
Running Large Language Model easily.
Language:Jupyter Notebook60
tianhao-ai/Detecting-Machine-Generated-Text-COMP90051-2023S1-Project-1
This project is about to detecting the text generated by different LLM given prompt. The instance is labeled by Human and Machine, and this project utilised both traditional machine learning method and deep learning method to classify the instance.
Language:Jupyter Notebook4 1 00
afogarty85/applied_nlp_demos
Language:Python2 3 01
stoneyang/cv-arxiv-daily
🎓Automatically Update CV Papers Daily using Github Actions (Update Every 24th hours)
Language:Python1 2 00

pretrain

brightmart/nlp_chinese_corpus

keyu-tian/SparK

CLUEbenchmark/CLUECorpus2020

yangjianxin1/Firefly-LLaMA2-Chinese

microsoft/UniVL

xcfcode/What-I-Have-Read

THUNLP-AIPoet/BERT-CCPoem

thunlp/RE-Context-or-Names

huzongxiang/MatDGL

CoinCheung/MFM

SalesforceAIResearch/pretrain-time-series-cloudops

nancheng58/SSL4SR

bayartsogt-ya/albert-mongolian

yongzhuo/MacroGPT-Pretrain

mrzjy/hoyo_public_wiki_parser

pskliff/vtb-data-fusion

arrrrrmin/albert-guide

janelu9/flash-finetuning

tianhao-ai/Detecting-Machine-Generated-Text-COMP90051-2023S1-Project-1

afogarty85/applied_nlp_demos

stoneyang/cv-arxiv-daily