πκ°λμ± π¦κ°λ―Όμ§ πΊμ λκ·Ό
- Domain : νμ΄ νλ«νΌμ μΉμ ν μ€λͺ μ ν΄μ£Όλ μ±λ΄μ ꡬμΆνμμ΅λλ€.
- Concept : 'ν΄μ'체λ₯Ό μ¬μ©νλ©° μΉμ νκ² λ΅νλ μ±λ΄. λ§λμ 'μΈμ λ μ§ λ¬Όμ΄λ³΄μΈμ! νν~!'μ λΆμ¬ ννμ΄ μ»¨μ μ μ μ§
- Model : Mistral κΈ°λ°μ Zephyr λͺ¨λΈκ³Ό Metaμ Llama3 λͺ¨λΈμ λμμΌλ‘ μ§ννμμ΅λλ€.
- Dataset : λ§ν¬ νμ΅μ μν λ°μ΄ν°μ μ ꡬμΆνμ¬ μ§ννμμ΅λλ€. Dongwookss/q_a_korean_futsal, mintaeng/llm_futsaldata_yo
- How-to? λ§ν¬ νμ΅μ μν Fine-tuningκ³Ό μ 보 μ 곡μ μν RAGλ₯Ό μ μ©μμΌ°μ΅λλ€. ꡬνμ FastAPIλ₯Ό μ΄μ©νμ¬ Back-endμ μν΅ν μ μλλ‘ μ§ννμμ΅λλ€.
FastAPI μ€ν
uvicorn main:app --reload -p <ν¬νΈλ²νΈμ§μ >
-
Fine-tuned Model : Llama3-8b, Zephyr-7b κ°κ° νλμ μ§ννμμ΅λλ€.
-
GPU : Colab L4
-
Method : LoRA(Low Rank Adaptation) & QLoRA(Quantized LoRA)
-
Trainer : SFTrainer,
DPOTrainer -
Dataset : Dongwookss/q_a_korean_futsal, mintaeng/llm_futsaldata_yo
TrainOutput(global_step=1761, training_loss=1.1261051157399513, metrics={'train_runtime': 26645.6613, 'train_samples_per_second': 2.644, 'train_steps_per_second': 0.066, 'total_flos': 7.784199669311078e+17, 'train_loss': 1.1261051157399513, 'epoch': 3.0})
- μΆν λ°©ν₯ : SFT(Supervised Fine-Tune) Trainer μ μ΄μ©νμ¬ νλμ μ§ννμκ³ λ§ν¬μ μ§μ€ν λ°μ΄ν°μ μΌλ‘ μΈν΄ λͺ¨λΈ μ±λ₯μ μμ¬μ΄ μ μ΄ λ§μμ΅λλ€. ν₯ν Q-A Taskμ λ§λ Fine-Tuningμ μ§νν μμ μ΄λ©° κ°ννμ΅μ ν΅ν΄ λͺ¨λΈμ±λ₯μ κ°μ ν μμ μ λλ€.
- Dongwooks -> μ΅μ’ λͺ¨λΈλͺ : big_fut_final & small_fut_final
Using HuggingFace Model with out RAG
# Using HuggingFace Model with out RAG
# !pip install transformers==4.40.0 accelerate
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import TextStreamer
model_id = 'Dongwookss/μνλλͺ¨λΈ'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
PROMPT = '''
Below is an instruction that describes a task. Write a response that appropriately completes the request.
'''
instruction = "question"
messages = [
{"role": "system", "content": f"{PROMPT}"},
{"role": "user", "content": f"{instruction}"}
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
text_streamer = TextStreamer(tokenizer)
output = model.generate(
input_ids,
max_new_tokens=4096,
eos_token_id=terminators,
do_sample=True,
streamer = text_streamer,
temperature=0.6,
top_p=0.9,
repetition_penalty = 1.1
)
-
νμ΄ κ·μ , ꡬμ₯ μ 보, νμ΄ μΉΌλΌ λ± λ€μν μ 보λ₯Ό μ 곡νκΈ° μν΄ λ°μ΄ν°λ₯Ό μμ§νκ³ RAGλ₯Ό ꡬμΆνμ¬ μ 보μ 곡μ νμμ΅λλ€.
-
Retrieval : Kiwipiepy+BM25 μ Embedding_Model + VectorDB μ‘°ν©μ ν΅ν΄ Semantic searchλ₯Ό λͺ©νλ‘ μ§ννμμ΅λλ€.
.
βββ backupfiles
β βββ # μλΉ νμΌ κ²½λ‘μ
λλ€.
βββ files
β βββ # RAGλ₯Ό ν΅ν΄ μ λ¬ν νμΌ κ²½λ‘μ
λλ€.
βββ for_nochain
β βββ __init__.py
β βββ mt_chat.py # Langchain μ μ΄μ©νμ§ μκ³ κ΅¬μ±νμμ΅λλ€. λͺ¨λΈ λ΅λ³ μλκ° μ νλ μ μμ΅λλ€.
βββ load_model_for_newchain.py
βββ load_model_type_a.py # AutoModelForCausalLMμ μ΄μ©νμ¬ λͺ¨λΈμ λΆλ¬μ΅λλ€.
βββ load_model_type_b.py # Unsloth ν¨ν€μ§μ FastLanguageModelμ μ΄μ©νμ¬ λͺ¨λΈμ λΆλ¬μ΅λλ€. μ΄λ adapter.configκ° μ‘΄μ¬νλ©΄ λΆλ¬μ€μ§ λͺ»νμ¬ μλ‘μ΄ κ²½λ‘μ λͺ¨λΈμ 볡μ¬νμμ΅λλ€.
βββ main.py # Fast API λ₯Ό μ΄μ©νμ¬ λͺ¨λΈμ μλΉν©λλ€. requestλ₯Ό ν΅ν΄ λͺ¨λΈκ³Ό μν΅ν μ μμ΅λλ€.
βββ main_new_chain.py # μ 체μΈμ μ΄μ©νμ¬ FastAPIλ₯Ό μ€νν©λλ€.
βββ pack
β βββ __init__.py
β βββ load_push.py # filesμ μλ λ°μ΄ν°λ₯Ό Load,Chunk,Embed, Vector DBμ μ μ₯ν©λλ€.
β βββ make_answer.py # λ΅λ³μμ± ν¨μλ₯Ό λ§λ€μμ΅λλ€.
β βββ make_chain_gguf.py # gguf νμΌμ λμμΌλ‘ ollama λ₯Ό μ μ©μν΅λλ€.
β βββ make_chain_model.py # Safetensorsλ‘ μ΄λ£¨μ΄μ§ λͺ¨λΈλ‘ Chainμ μμ±ν©λλ€. μ΄λ GPUμμμ΄ λ§μ΄ μꡬλ©λλ€.
β βββ retrieve_docs.py # Retrievalμ μ΄μ©νμ¬ μνλ λ°μ΄ν°λ₯Ό μ°Ύμ΅λλ€.
β βββ retriever.py # Retrievalμ μ€μ ν©λλ€.
βββ sft_tuning # λͺ¨λΈ νμΈνλ κ³Όμ μ
λλ€. μ€μ νλΌλ―Έν°μ λν κ°μ΄ λΉμ΄μμ μ μμ΅λλ€.
β βββ Unsloth_sft.ipynb
βββ test.ipynb