blair-colrev-workflow

Blair's personal workflow for CoLRev. Not authoritative on CoLRev. Your mileage may vary.

What is this?

CoLRev (https://github.com/CoLRev-Environment/colrev) is an amazing tool, but still being rapidly developed and thus breaking changes may occur from time to time. There are also some features that I just cannot get working on my environment (could be because of my insistence on using Mac 😅).

This document outlines how I use a ~~hacky~~ creatively appropriated workflow around CoLRev on my Mac.

macOS 14.4.1 23E224 arm64 (Mac14,15 Apple M2)
Python 3.11.6
CoLRev 0.11.0

(Unfortunately I could not get CoLRev 0.12.0 working on my setup: nothing was being imported into records.bib...)

Assumptions

Much of the workflow below will still work if these assumptions are not met, but I'm just documenting what I've been using:

.bib exports from Scopus (e.g., using https://www.litbaskets.io)
MacOS with Python installed via https://brew.sh

Also tested on WSL (Ubuntu 24.04 on Windows 11), provided the following is set up:

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.11
sudo apt install python3.11-dev python3.11-venv

Also tested on Powershell (Windows 11), provided the following is set up:

scoop bucket add versions
scoop install versions/python311

(I tried with Python 3.12 on Ubuntu 24.04 WSL on Windows 11; it had difficulty building dependency levenshtein.)

Part 1 - Getting CoLRev working

Go to ~/00blair/gitrepos-colrev (or equivalent on your machine).

In this folder, I like to keep all the various venvs that I could be using with CoLRev. Here's an example of me making a new venv:

# Linux/Mac
python3.11 -m venv _venv_colrev_0_11_0
source ./_venv_colrev_0_11_0/bin/activate
python -m pip install --upgrade pip wheel
python -m pip install --upgrade colrev==0.11.0

# Windows - Powershell
python3 -m venv _venv_colrev_0_11_0
source .\_venv_colrev_0_11_0\Scripts\Activate.ps1
python -m pip install --upgrade pip wheel
python -m pip install --upgrade colrev==0.11.0

Immediately after successfully installing CoLRev into a new venv, it could be a good idea to deactivate and then ZIP the whole thing: e.g., _"venv_colrev_0_11_0 (tested on Python 3.11 on macOS 14.4.1 23E224 arm64).zip". 📦

All subsequent steps need to be done with that venv activated.

Part 2 - Setting up the repo locally

mkdir a new folder for the CoLRev repository
Run colrev init --light (to avoid Docker services; although from my testing, the regular colrev init works too, at least on Mac)
Modify .gitignore (e.g., for .DS_Store) and make a commit.
If the commit gets stuck on "CoLRev ReviewManager: format............................" (or check or report), comment these out in .pre-commit-config.yaml. Then try the commit again.
Move the first of the .bib files to data/search. Then run colrev retrieve. WARNING: this step & the next step, collectively, could take quite some time!

Hit ENTER for each of these...

2024-04-11 14:42:17 [INFO] search [colrev.scopus > data/search/Q0070_Foucault.bib]
DB search (update)
- Go to https://www.scopus.com/search/form.uri?display=advanced and run the following query:



- Replace search results in /Users/blair/00blair/gitrepos-colrev/test/data/search/Q0070_Foucault.bib
Press enter to continue

If you get this error — Invalid language codes: undefined — go and modify the bib files accordingly (i.e., in this case, search for language = {undefined}, and replace it somehow). Then run colrev retrieve again.

If you had recieved messages like this, modify the files manually. Then run colrev retrieve again.

2024-04-11 15:13:05 [ERROR] De Moya2020940 not imported
2024-04-11 15:13:05 [ERROR] Ochoa Pacheco2023 not imported
2024-04-11 15:13:05 [ERROR] De Jong20051610 not imported
2024-04-11 15:13:05 [ERROR] Van Grembergen199963 not imported

HINT: for steps 9 and 10, copy-paste the entire contents of the Terminal into CotEditor or equivalent, then search for [ERROR] and Invalid.

After a few successful "colrev retrieve" runs, run a top-up "colrev retrieve" for good luck (sometimes it consolidates/cleans up some of the records, e.g., around the md_crossref.bib and md_dblp.bib files). Then ZIP the whole folder just in case. Then close the current terminal window and open a fresh one (for a clean log).
Repeat steps 7-11 for each of the bib files, one at a time.
- Take a ZIP snapshot of the whole repo after this, for good luck.
Run colrev load
- Take a ZIP snapshot of the whole repo after this, for good luck.
Run colrev dedupe
- Take a ZIP snapshot of the whole repo after this, for good luck.

Part 3 - Setting up the repo on GitHub

Go over to GitHub in your web browser
On GitHub, create a new GitHub repository but do not initialise it

In terminal:

git remote add origin https://github.com/<YOUR REMOTE DETAILS HERE>
git push -u origin main

Part 4 - Screening metadata records

NOTE: For these steps, I like to use the included blair-data folder which you can put in your CoLRev repo. Doing so is of course optional. I found it was safe to run python -m pip install -r requirements.txt using the same venv as in Part 1. So then I could just run sh ./records_to_csv_wrapper.sh within the blair-data folder without any problems. 😊

Setup exclusion criteria in settings.json and test it by setting one entry as rev_prescreen_excluded.

colrev_status                 = {rev_prescreen_excluded},
screening_criteria            = {notcisr=out},

Manually edit records.bib, save it, and run colrev status from time to time. For screening metadata records, mark as:

rev_prescreen_included
rev_prescreen_excluded

Example of colrev status output:

~/00blair/gitrepos-colrev/critical-is-research-colrev main* 6s
_venv_colrev_0_11_0 ❯ colrev status
Status
	init
	retrieve         20 retrieved     1147 to prepare [only 0 quality-curated]
	prescreen        14 included      [6 prescreen excluded]
	pdfs              0 retrieved     14 to retrieve
	screen            0 included
	data              0 synthesized

Part 5 - Screening full-texts

For each file that passed rev_prescreen_included, obtain the PDF and put it in data/pdfs (NOTE: you might want to make pdfs a symlink to storage elsewhere)

The following go into records.bib:

colrev_status                 = {pdf_imported},
file                          = {data/pdfs/AanestadHansethMonteiroEtAl2024.pdf},

Run colrev pdf-prep to get CoLRev to process each of the PDFs, and then the status will update from pdf_imported to pdf_prepared.
If updated to pdf_needs_manual_preparation then run colrev pdf-prep-man
For screening full-texts, mark as:
- rev_synthesized
- rev_included
- rev_excluded

Not working for me yet

PRISMA chart ☹️

blairw/blair-colrev-workflow