안녕하세요 학습 과정 중 (accuracy)에 관해 몇가지 질문이 있어 연락드립니다.

Question

안녕하세요 학습 과정 중 (accuracy)에 관해 몇가지 질문이 있어 연락드립니다.

Closed this issue 8 months ago · 2 comments

ckc5800 commented 8 months ago

안녕하세요

블로그를 재밌게 읽고 학습을 시켜보고 싶어 코드를 돌려보고 있는데
여러가지 오류가 나오더라고요...
도커로 환경 구축해서.... 이게 라이브러리 버전...의 문제는 아닌거 같은데

제가 코드를 수정해서 오류가 나오는건지?? 아님 코드 내 문제가 있는건지??
뭐 때문에 오류가 생긴건지? 궁금합니다. 감사합니다.

README를 읽으며 진행중인데... 잘 안되어 질문드립니다.
아래 이미지를 진행 중이며...

에러는 여기서부터 하나씩.. 해결중이에요

1. 깃에 공유된 코드 내에는 task라는 인자가 없는데... 실행하니깐 필요하다고 하는데.. 왜 그런건가요??

아!! task가 바이너리, 멀티클래스, 멀티레이블 이렇게 있는데..
제 생각에는 멀티레이블로 진행을 해야할거 같은데

File "/home/dialogue_summary_competition_master/summarizer/method/default.py", line 87, in validation_step
    accuracy = torchmetrics.functional.accuracy(logits, labels, ignore_index=self.model.config.pad_token_id)#, task="binary")

TypeError: accuracy() missing 1 required positional argument: 'task'

multiclass로 했을 때

ValueError: Optional arg num_classesmust be typeint when task is multiclass. Got <class 'NoneType'>

multilabel로 했을 때

ValueError: Optional arg num_labels must be type int when task is multilabel. Got <class 'NoneType'>

이런식으로 진행이 안됩니다.... ㅠㅠ
일단 task를 binary로 진행을 해보는 중인데.. 아래의 문제들이 또 발생해요
그래서 task = "multilabel", num_labels=len(labels.tolist()) 를 추가해줬어요
accuracy = torchmetrics.functional.accuracy(logits, labels, ignore_index=self.model.config.pad_token_id, task="multilabel", num_labels=len(labels.tolist()))

2. warning이긴 한데.. 안 보이게 하려고 + 학습할 때 사용할 GPU의 개수는 1이라 제일 무난한 임의의 값 4를 넣어 주었습니다.

UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 32 which is the number of cpus on this machine) in theDataLoader` init to improve performance.

train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=args.batch_size, num_workers=4)

3. 저는 배치값이니 데이터니 바꿔준게 없는데.. 왜 target이랑 output이랑 shape가 다른건가요..? 제가 못하는건지.. 코드가 그런건지.. 모르겠어요...

RuntimeError: Predictions and targets are expected to have the same shape, but got torch.Size([65280, 4000]) and torch.Size([65280]).

labels = batch["decoder_input_ids"][:, 1:].reshape(-1)
logits = output["logits"][:, :-1].reshape([labels.shape[0], -1])
accuracy = torchmetrics.functional.accuracy(logits, labels, ignore_index=self.model.config.pad_token_id, task="binary")

print('logits.size()', logits.size()) 
# torch.Size([65280, 4000])

print('labels.size()', labels.size()) 
# labels.size() torch.Size([65280])

아 그리고 여기 logits, labels의 경우 어떻게 수정해야 할까요...?

logit이 torch.Size([65280, 4000]) -> torch.Size([batch_size(256), 65280, 4000])
labels이 torch.Size([65280, 4000]) -> torch.Size([batch_size(256), 65280])

으로 수정하면 될까요?

긴 글 읽어주셔서 너무 감사하고 도움 주시면 너무 너무 너무 감사하겠습니다.

좋은 하루 보내세요

Answer 1 · 2024-05-01T06:00:25.000Z

아하 넵 안녕하세요!

이 경우는 torchmetrics library 버전 문제로 보입니다. 이제 task가 필수인가 보네요. 저건 token prediction accuracy이기 때문에 task는 multiclass로 사용하시면 되고, class 개수는 vocab size로 주시면 될 겁니다.

적당히 넣어주셔도 되고 cpu 개수로 넣어주기도 하고 그렇습니다.

저기서 loss와 accuracy를 계산할 때 loss함수가 sequence length 차원을 고려할 수 없어서 shape을 펼쳐? 주다보니 그렇게 나오는 것입니다. reshape 하기 전에 shape를 기록해놓고 sequence length 차원을 다시 살리면 될 것 같습니다.

감사합니다.

Answer 2 · 2024-05-02T05:04:39.000Z

와 답글 달아주셔서 너무 감사합니다!!!!
답변 주신것처럼 진행해볼게요 감사합니다!!!
좋은 하루 보내세요