Are the datasets all the same version?
dtch1997 opened this issue · 2 comments
The examples in preprocessed_data/generate_dataset.json
seem different from those in llm_generated_data/agree_disagree_dataset.json
.
For example, the first example in generate_dataset_json
starts with this:
Hello, my name is Linda Johnson and I am a researcher in the field of artificial intelligence with a specialization in natural language processing...
However, the name Linda Johnson
doesn't appear at all in agree_disagree_dataset.json
, which leads me to believe these might be artifacts from two different version of the dataset.
The sycophancy data used for generating the CAA vectors is a mixture of the llm_generated_data/agree_disagree_dataset.json
(which I generated myself using GPT-4) and data downloaded from Anthropic's model written evals sycophancy dataset. See this script which mixes them.
Please note that I am working on a new version of this repo (see branch v2
) which should be ready by the end of the week, that will include more behaviors, a cleaner architecture, and fix some experimental flaws. I plan to update our arxiv paper accordingly.
Got it, thank you! That sounds amazing, and thanks once again for being so responsive.