evaluate_honest.py generates the completions only using the target model, and evaluate_honest_spec.py generates the completions using speculative decoding.