/evaluate_plamo13b

Evaluation notebooks for PLaMo-13B

Primary LanguageJupyter NotebookMIT LicenseMIT

About this Repository

This repository manages the source code for reproducing the evaluation results of PLaMo-13B by PFN (Preferred Networks)[1].

Benchmark Result

Table1. Benchmark Result of JCommonsenseQA

model acc_norm (1-shot; this repo) acc_norm (reported in [1])
PLaMo-13B[2] 54.8 53.4
Japanese StableLM Alpha 7B[3] 51.0 75.9 (27.7*)

* without changing prompt

Run benchmark

# PLaMo-13B
./run_notebook.sh eval_plamo13b_jcommonsenseqa

# Japanese StableLM Alpha 7B
./run_notebook.sh eval_japanese-stablelm-base-alpha-7b_jcommonsenseqa

Reference