/Chatbot_FutFut

LLMOps

Primary LanguageJupyter NotebookMIT LicenseMIT

Chatbot_FutFut

Team

Members

πŸ˜Žκ°•λ™μš±      πŸ¦„κ°•λ―Όμ§€      πŸ˜Ίμ‹ λŒ€κ·Ό

πŸš€ Use Tech

Ubuntu Slack HuggingFace Colab Colab


About FutFut

  • Domain : ν’‹μ‚΄ ν”Œλž«νΌμ— μΉœμ ˆν•œ μ„€λͺ…을 ν•΄μ£ΌλŠ” 챗봇을 κ΅¬μΆ•ν•˜μ˜€μŠ΅λ‹ˆλ‹€.
  • Concept : 'ν•΄μš”'체λ₯Ό μ‚¬μš©ν•˜λ©° μΉœμ ˆν•˜κ²Œ λ‹΅ν•˜λŠ” 챗봇. 말끝에 'μ–Έμ œλ“ μ§€ λ¬Όμ–΄λ³΄μ„Έμš”! ν’‹ν’‹~!'을 λΆ™μ—¬ 풋풋이 컨셉을 μœ μ§€
  • Model : Mistral 기반의 Zephyr λͺ¨λΈκ³Ό Meta의 Llama3 λͺ¨λΈμ„ λŒ€μƒμœΌλ‘œ μ§„ν–‰ν•˜μ˜€μŠ΅λ‹ˆλ‹€.
  • Dataset : 말투 ν•™μŠ΅μ„ μœ„ν•œ 데이터셋을 κ΅¬μΆ•ν•˜μ—¬ μ§„ν–‰ν•˜μ˜€μŠ΅λ‹ˆλ‹€. Dongwookss/q_a_korean_futsal, mintaeng/llm_futsaldata_yo
  • How-to? 말투 ν•™μŠ΅μ„ μœ„ν•œ Fine-tuningκ³Ό 정보 μ œκ³΅μ„ μœ„ν•œ RAGλ₯Ό μ μš©μ‹œμΌ°μŠ΅λ‹ˆλ‹€. κ΅¬ν˜„μ€ FastAPIλ₯Ό μ΄μš©ν•˜μ—¬ Back-end와 μ†Œν†΅ν•  수 μžˆλ„λ‘ μ§„ν–‰ν•˜μ˜€μŠ΅λ‹ˆλ‹€.

How to use?

FastAPI μ‹€ν–‰
uvicorn main:app --reload -p <ν¬νŠΈλ²ˆν˜Έμ§€μ •>

About Fine-tuning

  • Fine-tuned Model : Llama3-8b, Zephyr-7b 각각 νŠœλ‹μ„ μ§„ν–‰ν•˜μ˜€μŠ΅λ‹ˆλ‹€.

  • GPU : Colab L4

  • Method : LoRA(Low Rank Adaptation) & QLoRA(Quantized LoRA)

  • Trainer : SFTrainer, DPOTrainer

  • Dataset : Dongwookss/q_a_korean_futsal, mintaeng/llm_futsaldata_yo

  • Finetune :

TrainOutput(global_step=1761, training_loss=1.1261051157399513, metrics={'train_runtime': 26645.6613, 'train_samples_per_second': 2.644, 'train_steps_per_second': 0.066, 'total_flos': 7.784199669311078e+17, 'train_loss': 1.1261051157399513, 'epoch': 3.0})
  • μΆ”ν›„ λ°©ν–₯ : SFT(Supervised Fine-Tune) Trainer 을 μ΄μš©ν•˜μ—¬ νŠœλ‹μ„ μ§„ν–‰ν•˜μ˜€κ³  λ§νˆ¬μ— μ§‘μ€‘ν•œ λ°μ΄ν„°μ…‹μœΌλ‘œ 인해 λͺ¨λΈ μ„±λŠ₯에 μ•„μ‰¬μš΄ 점이 λ§Žμ•˜μŠ΅λ‹ˆλ‹€. ν–₯ν›„ Q-A Task에 λ§žλŠ” Fine-Tuning을 진행할 μ˜ˆμ •μ΄λ©° κ°•ν™”ν•™μŠ΅μ„ 톡해 λͺ¨λΈμ„±λŠ₯을 κ°œμ„ ν•  μ˜ˆμ •μž…λ‹ˆλ‹€.

Fine-tuned Result(HuggingFaceπŸ€—):

  • Dongwooks -> μ΅œμ’… λͺ¨λΈλͺ… : big_fut_final & small_fut_final
Using HuggingFace Model with out RAG
# Using HuggingFace Model with out RAG 
# !pip install transformers==4.40.0 accelerate

import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import TextStreamer

model_id = 'Dongwookss/μ›ν•˜λŠ”λͺ¨λΈ'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

PROMPT = '''
Below is an instruction that describes a task. Write a response that appropriately completes the request.
'''
instruction = "question"

messages = [
    {"role": "system", "content": f"{PROMPT}"},
    {"role": "user", "content": f"{instruction}"}
    ]
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

text_streamer = TextStreamer(tokenizer)
output = model.generate(
    input_ids,
    max_new_tokens=4096,
    eos_token_id=terminators,
    do_sample=True,
    streamer = text_streamer,
    temperature=0.6,
    top_p=0.9,
    repetition_penalty = 1.1
)

About RAG

  • ν’‹μ‚΄ κ·œμ •, ꡬμž₯ 정보, ν’‹μ‚΄ 칼럼 λ“± λ‹€μ–‘ν•œ 정보λ₯Ό μ œκ³΅ν•˜κΈ° μœ„ν•΄ 데이터λ₯Ό μˆ˜μ§‘ν•˜κ³  RAGλ₯Ό κ΅¬μΆ•ν•˜μ—¬ μ •λ³΄μ œκ³΅μ„ ν•˜μ˜€μŠ΅λ‹ˆλ‹€.

  • Retrieval : Kiwipiepy+BM25 와 Embedding_Model + VectorDB 쑰합을 톡해 Semantic searchλ₯Ό λͺ©ν‘œλ‘œ μ§„ν–‰ν•˜μ˜€μŠ΅λ‹ˆλ‹€.

Directory Structure

.
β”œβ”€β”€ backupfiles
β”‚   └── # μ˜ˆλΉ„ 파일 κ²½λ‘œμž…λ‹ˆλ‹€.
β”œβ”€β”€ files
β”‚   └── # RAGλ₯Ό 톡해 전달할 파일 κ²½λ‘œμž…λ‹ˆλ‹€.

β”œβ”€β”€ for_nochain
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── mt_chat.py #  Langchain 을 μ΄μš©ν•˜μ§€ μ•Šκ³  κ΅¬μ„±ν•˜μ˜€μŠ΅λ‹ˆλ‹€. λͺ¨λΈ λ‹΅λ³€ 속도가 μ €ν•˜λ  수 μžˆμŠ΅λ‹ˆλ‹€.

β”œβ”€β”€ load_model_for_newchain.py
β”œβ”€β”€ load_model_type_a.py # AutoModelForCausalLM을 μ΄μš©ν•˜μ—¬ λͺ¨λΈμ„ λΆˆλŸ¬μ˜΅λ‹ˆλ‹€.
β”œβ”€β”€ load_model_type_b.py # Unsloth νŒ¨ν‚€μ§€μ˜ FastLanguageModel을 μ΄μš©ν•˜μ—¬ λͺ¨λΈμ„ λΆˆλŸ¬μ˜΅λ‹ˆλ‹€. μ΄λ•Œ adapter.configκ°€ μ‘΄μž¬ν•˜λ©΄ λΆˆλŸ¬μ˜€μ§€ λͺ»ν•˜μ—¬ μƒˆλ‘œμš΄ κ²½λ‘œμ— λͺ¨λΈμ„ λ³΅μ‚¬ν•˜μ˜€μŠ΅λ‹ˆλ‹€.

β”œβ”€β”€ main.py # Fast API λ₯Ό μ΄μš©ν•˜μ—¬ λͺ¨λΈμ„ μ„œλΉ™ν•©λ‹ˆλ‹€. requestλ₯Ό 톡해 λͺ¨λΈκ³Ό μ†Œν†΅ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
β”œβ”€β”€ main_new_chain.py # μƒˆ 체인을 μ΄μš©ν•˜μ—¬ FastAPIλ₯Ό μ‹€ν–‰ν•©λ‹ˆλ‹€.

β”œβ”€β”€ pack
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ load_push.py # files에 μžˆλŠ” 데이터λ₯Ό Load,Chunk,Embed, Vector DB에 μ €μž₯ν•©λ‹ˆλ‹€.
β”‚   β”œβ”€β”€ make_answer.py # 닡변생성 ν•¨μˆ˜λ₯Ό λ§Œλ“€μ—ˆμŠ΅λ‹ˆλ‹€.
β”‚   β”œβ”€β”€ make_chain_gguf.py # gguf νŒŒμΌμ„ λŒ€μƒμœΌλ‘œ ollama λ₯Ό μ μš©μ‹œν‚΅λ‹ˆλ‹€.
β”‚   β”œβ”€β”€ make_chain_model.py # Safetensors둜 이루어진 λͺ¨λΈλ‘œ Chain을 μƒμ„±ν•©λ‹ˆλ‹€. μ΄λ•Œ GPUμžμ›μ΄ 많이 μš”κ΅¬λ©λ‹ˆλ‹€.
β”‚   β”œβ”€β”€ retrieve_docs.py # Retrieval을 μ΄μš©ν•˜μ—¬ μ›ν•˜λŠ” 데이터λ₯Ό μ°ΎμŠ΅λ‹ˆλ‹€.
β”‚   └── retriever.py # Retrieval을 μ„€μ •ν•©λ‹ˆλ‹€.

β”œβ”€β”€ sft_tuning # λͺ¨λΈ νŒŒμΈνŠœλ‹ κ³Όμ •μž…λ‹ˆλ‹€. μ€‘μš” νŒŒλΌλ―Έν„°μ— λŒ€ν•œ 값이 λΉ„μ–΄μžˆμ„ 수 μžˆμŠ΅λ‹ˆλ‹€.
β”‚   └── Unsloth_sft.ipynb
└── test.ipynb

πŸ€— HuggingFace account πŸ€—