/bbs_bird_mistnet_split

Fetch North American Breeding Bird Survey data with mistnet train & test split

Primary LanguagePython

Fetch North American Breeding bird data with mistnet train & test split

The code in this repository uses the scripts provided by David Harris's mistnet model to process the BBS data into a training and test dataset, as well as cross validation folds. The mistnet paper is available here.

Please note: The dataset fetched here is proprietary. Please make sure to read the terms of use here.

The code downloads the BBS data, uses David Harris's (modified) scripts to split them into training and test sets, and saves the results as csvs into the subfolder csv_bird_data.

Please note that the scripts have been updated to use the latest release of the BBS dataset. This meant I had to remove some checks. I will run further checks on the data in the coming months and make updates if required, but use at your own risk for now!

Requirements

To run, the code requires:

  • python (tested under python 2.7.14)
  • R (tested under 3.4.3) with packages geosphere, raster, caret and lubridate
  • The UNIX command line tool wget

How to run

Make sure (!) to clone this repository with its submodules by using:

git clone --recurse-submodules CLONE_URL

Once cloned, you should be able to simply run:

python prepare_dataset.py

Note that this can take a while, since it has to download a lot of files and process the results (probably around 30 minutes in total, or so). If everything goes to plan, you should find a folder called csv_bird_data with the following contents:

├── fold.ids.csv
├── in.test.csv
├── in.train.csv
├── latlon.csv
├── route.presence.absence.csv
├── species.data.csv
└── x.csv