likenneth/honest_llama
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
PythonMIT
Issues
- 4
Clarification on where to intervene
#42 opened by CheongWoong - 4
Will you publish the Llama3 fine-tuned model?
#39 opened by aryopg - 1
Will you publish the Llama3 fine-tuned model?
#40 opened by aryopg - 1
ask about the insight behind ασθ
#37 opened by NieSYsc20 - 3
- 5
- 1
Code for CCS
#33 opened by itsmemala - 4
Clarifications on table 5 of paper
#34 opened by itsmemala - 4
Why does memory accumulate and ultimately cause overflow when running get_activations.py?
#26 opened by Renpf2022 - 3
- 6
- 1
Query regarding dimensions of activations
#32 opened by itsmemala - 1
Interesting work! Providing an additional convenient way of reproducing the results!
#31 opened by frankaging - 2
How to calculate the gap between generation accuracy and probe accuracy, which is 40% mentioned in the paper?
#28 opened by DLiquor - 7
- 1
Does ITI support Qwen?
#25 opened by menghonghan - 3
False Answer in OamPatel/iti_trivia_qa_val
#22 opened by Vicent0205 - 2
Support llama2 series models?
#21 opened by skykiseki - 1
why u chose ‘tqa_gen_end_q’ to compute std and not ‘tqa_mc2’ to compute std.
#19 opened by thyywr759 - 1
- 3
what is self_attn.head_out
#7 opened by JianqiaoLu - 2
Query on result of Vicuna and Alpaca
#17 opened by jongjyh - 1
- 3
inquery of visualizing result on the paper
#4 opened by jongjyh - 1
Comparison with other tuning methods
#14 opened by FLLLIGHT - 3
Cannot find truthful_qa.py
#13 opened by LebronX - 3
disagreement about truthful qa results
#5 opened by Vicent0205 - 1
Saving the model after shifting activations
#12 opened by A-Raafat - 1
Difference between tqag_gen_end_q and tqa_gen?
#11 opened by jongjyh - 2
- 2
How to use tqa_gen and tqa_end_end_q?
#9 opened by CaoYiqingT - 2
- 7
Potential Data Leakage in Probes Training
#6 opened by jongjyh - 3
- 6
- 1
inquery on equaltion of paper
#3 opened by jongjyh