Some Simple implement of Fun NLP algorithm in Pytorch. updating and maintaining
If you have any question, please comment in Issue
If project helps you, welcome Star~ (Please Dont Just Fork without Star (´・ω・`) )
you can go into each project folder for more details in folder's readme.md inside,
- Text Classification Based on many Models (BiLSTM,Transformer) go here
- Summary Generation (Pointer Generator NetWork) go here
- Dialogue Translation (Seq2Seq) to build your own DialogueBot~~ go here
- Use GNN in Text Classification go here
- Transformer Mask Language Model Pretraining go here
- GPT for Text Generation and GPT for math problem go hereSource Repo
- Adversarial training (FGM) go here
- Very Simple and quick Use/Deploy of Seq2Seq-Transformer. Including Several Eamples(Denoise Pretrain, Medical-QuestionAnswering go here
- Practical use of Pytorch_Lighting go here
- AMP and Fp16 training for Pytorch go here
- Usefully Visualize Toolkit for Attention Map(or Other weighted Matrix go here
My other open source NLP projects
- BERT in Relation Extraction:Ricardokevins/Bert-In-Relation-Extraction: 使用Bert完成实体之间关系抽取 (github.com)
- Text-matching:Ricardokevins/Text_Matching: NLP2020中兴捧月句子相似度匹配 (github.com)
- Transformer implement and useful NLP toolkit:Ricardokevins/EasyTransformer: Quick start with strong baseline of Bert and Transformer without pretrain (github.com)
- Thanks to @rattlesnakey's Issue(more discussion detail here). I add Feature in Pretrain Project. Set the Attention Weight of MASK-Token to Zero to prevent MASK-Tokens Self-Attention Each other. You can enable this feature in Transformer.py by setting "self.pretrain=True". PS:The New Feature has not been verified for the time being, and the effect on the pre-training has not been verified. I'll fill in the tests later
- Rebuild the code structure in Transformer. Make Code Easier to Use and deploy
- Add Examples: Denoise-Pretrain in Transformer (Easy to use)
- Update use Seq2Seq Transformer to Modeling Medical QA task (Tuing on 55w pairs of Chinese Medical QA data) More detail to be seen in README.md of Transformer/MedQAdemo/
- Update new Trainer and useful tools
- remove previous implement of Transformer (with some unfixable bugs)
- 初次commit 添加句子分类模块,包含Transformer和BiLSTM以及BiLSTM+Attn模型
- 上传基本数据集,句子二分类作为Demo例子
- 加上和使用对抗学习思路
- 重新整理和更新了很多东西.... 略
- 修复了Text Classification的一些整理问题
- 增加了Text Classification对应的使用说明
- 增加了MLM预训练技术实践
- 修复了句子分类模型里,过分大且不必要的Word Embed(因为太懒,所以只修改了Transformer的)
- 在句子分类里增加了加载预训练的可选项
- 修复了一些BUG
- 增加了GNN在NLP中的应用
- 实现了GNN在文本分类上的使用
- 效果不好,暂时怀疑是数据处理的问题
- 增加了CHI+TFIDF传统机器学习算法在文本分类上的应用
- 实现和测试了算法性能
- 更新了README
- 重构了对话机器人模型于Seq2Seq文件夹
- 实现了BeamSearch解码方式
- 修复了PGN里的BeamSearch Bug
- 添加了GPT在文本续写和数学题问题的解决(偷了karpathy/minGPT: A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training (github.com)代码实现的很好,对理解GPT很有帮助,偷过来看看能不能用在好玩的东西
- 重构了Pointer Generator NetWork,之前的表现一直不好,打算干脆重构,一行一行的重新捋一遍,感觉会安心很多。施工ing。
- 修复了Pretrain里Mask Token未对齐,位置不一致问题
- 在Transformer里增加了一个随机数字串恢复的Demo,对新手理解Transformer超友好,不需要外部数据,利用随机构造的数字串训练
- 新增实验TransfomerVAE,暂时有BUG,施工中
- update BM25 and TF-IDF algorithm for quick match of Text.
- Update Practical use of Pytorch_Lighting, Use Text_classification as Example. Convert the Pytorch to LightningLite. More details in LightingMain.py。
- Remove the redundant code
- update Practical use of Amp(Automatic Mixed Precision). Implement in VAEGenerator, Test on local MX150, Significant improve the training time and Memory-Usage, More details in Comments at the end of the code
- Based the command of Amp, Modified the definition of 1e-9 to inf in model.py
- Update Weighted Matrix Visualize Toolkit(eg. used for visualize of Attention Map) implement in Visualize. More Useful toolkit in the future
- Update Python comment Code Standards. More formal code practices will be followed in the future.
https://blog.csdn.net/chaojianmo/article/details/105143657
https://featurize.cn/notebooks/368cbc81-2b27-4036-98a1-d77589b1f0c4