/Reinforcement-learning-LLM

Fine-tuned Google's FLAN-T5 for non-toxic dialogue summaries using Hugging Face Transformers, toxicity detection, detoxification, and PPO-based training.

Primary LanguageJupyter Notebook

No issues in this repository yet.