/xeno-canto-py

Python wrapper for the xeno-canto.org API to aid in downloading and managing recordings.

Primary LanguagePythonMIT LicenseMIT

xeno-canto API Wrapper

xeno-canto-py is an API wrapper designed to help users download xeno-canto.org recordings and associated information in an efficient manner. Download requests are processed concurrently using the asyncio, aiohttp and aiofiles libraries to optimize retrieval time. The wrapper also offers delete and metadata generation functions for recording library management.

Created to aid in data collection and filtering for the training of machine learning models.

Installation

xeno-canto-py is available on PyPi and can be downloaded with the package manager pip to install xeno-canto-py.

pip install xeno-canto

The package can then be used straight from the command-line:

xeno-canto -dl Bearded Bellbird

Or imported into an existing Python project:

import xenocanto

For users who want more control over the wrapper, navigate to your desired file location in a terminal window and then clone the repository with the following command:

git clone https://github.com/ntivirikin/xeno-canto-py

The only file required for operation is xenocanto.py, so feel free to remove the others or move xenocanto.py to another working directory.

WARNING: Please exercise caution using test.py as executing the tests via unittest or other test harness will delete any dataset folder in the working directory following completion of the tests.

Usage

The xeno-canto-py wrapper supports the retrieval of metadata and audio from the xeno-canto database, as well as library management functions such as deletion of recordings matching input tags, removal of folders with an insufficient amount of audio recordings and generation of a single JSON metadata file for a given path containing xeno-canto audio recordings. Examples of command usage are given below.


Metadata Download xeno-canto -m [parameters]

Downloads metadata as a series of JSON files and returns the path to the metadata folder.

Example: Metadata retrieval for Bearded Bellbird recordings of quality A

xeno-canto -m Bearded Bellbird q:A


Audio Recording Download xeno-canto -dl [parameters]

Retrieves the metadata for the request and uses it to download audio recordings as MP3s from the database.

Example: Download Bearded Bellbird recordings from the country of Brazil

xeno-canto -dl Bearded Bellbird cnt:Brazil


Delete Recordings xeno-canto -del [parameters]

Delete recordings with ANY of the parameters given as input.

Example: Delete ALL quality D recordings and ALL recordings from Brazil

xeno-canto -del q:D cnt:Brazil


Purge Folders

Removes any folders within the dataset/audio/ directory that have less recordings than the input value num.

xeno-canto -p [num]

Example: Remove recording folders with less than 10 recordings (not inclusive)

xeno-canto -p 10


Generate Metadata

Generates metadata for the xeno-canto database recordings at the input path, defaulting to dataset/audio/ within the working directory if none is given.

xeno-canto -g [path]

Example: Generate metadata for the recordings located in bird_rec/audio/ within the working directory

xeno-canto -g bird_rec/audio/


parameters are given in tag:value form in accordance with the API search guidelines. For help in building search terms, consult the xeno-canto API guide and this article. The only exception is when providing English bird names as an argument to the delete function, which must be preceded with en: and have all spaces be replaced with underscores.

Directory Structure

Files are saved in the working directory under the folder dataset/. Metadata and audio recordings are separated into metadata/ and audio/ folders by request information and bird species respectively. For example:

dataset/
    - audio/
        - Indigo Bunting/
            - 14325.mp3
        - Northern Cardinal/
            - 8273.mp3
    - metadata/
        - library.json
        - IndigoBuntingcnt_Canada/
            - page1.json
        - NorthernCardinalq_A/
            - page1.json

Metadata is retrieved as a JSON file and contains information on each of the audio recordings matching the request parameters provided as input. The metadata also contains the download links used to retrieve the audio recordings. The library.json file is generated by running the metadata generation command -g.

Error 503

If an Error 503 is given when attempting a recording download, try passing a value lower than 4 as the num_chunks value in download(filt, num_chunks). This can either be done by changing the default value in the function definition for download(filt, num_chunks), or by passing a value into download(params) in the body of main() as shown below.

# Running with default 4 locks on semaphore
asyncio.run(download(params))

# Running with 3 locks rather than default
asyncio.run(download(params, 3))

Alternatively, you can try experimenting with higher values for num_chunks to see some performance improvements.

Contributing

All pull requests are welcome! If any issues are found, please do not hesitate to bring them to my attention.

Acknowledgements

Thank you to the team at xeno-canto.org and all its contributors for putting together such an amazing database.

License

MIT