langcog/metalab

update column specs in spec.yaml

Opened this issue · 5 comments

There are currently discrepancies between spec.yaml and the MA template codebook. I assume the MA template has been updated more recently? If true, the spec sheet will need to be updated before new data can be added to the database.

Two issues @anjiecao came across in the course of working on a MA for the challenge.

  1. response_mode in the template lists both "looking" and "eye-tracking" as options. I don't think we want looking as an option here?
  2. source_of_data is missing from spec.yaml. We probably want options like text/table, plot, author here.

I suspect there are other discrepancies, but these are two we came across.

Yes, we need to discuss this in depth! looking is supposed to be hand-coded vs eye-tracking being automatic, because external coders kept stumbling over eye-tracking for both types of data.

But right now, this means that some (all?) datasets are coded inconsistently. I think for some we can programmatically fix it (e.g. for most older papers). The overarching goal is to increase usability and transparency, but it should of course not come at the cost of data integrity. So with these points in mind, what do you think we should do?

I was wondering whether the Code Book could live on the MetaLab website instead of in the Google spreadsheet, basically as the spec.yaml but with a more user-friendly interface i.e. in a pretty table. This would avoid discrepancies between the two. but I wonder whether you think many users would struggle to find this Code Book compared to if it is just in the Google spreadsheet? We also need a way for users to add their own rows to the CodeBook when they add non-obligatory columns which might be tricky. Can discuss when we meet tomorrow, @christinabergmann

That would of course be ideal, they went out of synch and keeping track across multiple documents is always hard. Maybe @erikriverson can point us to a simple way of displaying spec.yaml in a user-friendly format.

Some progress on a couple of ideas:

  1. Able to generate Google Sheets template directly via R from spec.yaml, including options on columns and colors. I have a demo of this working with the Data sheet

  2. The validation app has a tab called "Fields Information" that is a searchable table and could be shown in a more prominent location on the MetaLab website. The previous version of the site had a "Field Specification" tab on this page: https://langcog.github.io/metalab2/documentation.html

Thanks @erikriverson ! (removing my assignment to help me keep track of my tasks, but if I can help again at any stage let me know!)