pyencodetools

Python wrapper for ENCODE (Encyclopedia of DNA Elements) API and (eventually) tools for working with the data.

Installation

Clone this repository and change into the repo directory. Run

python setup.py install

python setup.py develop

if you want to work on the source code.

Usage

Starting with an ENCODE identifier

If you have an ENCODE identifier such as a

long URL: https://www.encodeproject.org/biosample/ENCBS000AAA/
short URL: /biosample/ENCBS000AAA/
accession: ENCBS000AAA
UUID: 56e94f2b-25ac-4c58-9828-f63b66220999

you can query the ENCODE API as follows:

import pyencodetools as pet
res = pet.fetch('ENCBS000AAA')

fetch can take as input different identifiers and query the ENCODE API. fetch returns an ENCODERecord object with attributes that correspond to the keys of the JSON response dict from ENCODE. You can also instantiate an object directly:

res = pet.ENCODERecord('56e94f2b-25ac-4c58-9828-f63b66220999')

Search

You can search the ENCODE API as follows:

search_res = pet.search('bone chip')

search_res will be a list of ENCODERecord objects that match the search term. There is a default limit on the numer of results returned that can be changed:

search_res = pet.search('bone chip', limit=10)

Advanced Search

If you want to search using ENCODE's search parameters, begin your search term with '?'. For instance, the following query returns all DNAse experiments in human (up to a limit of 1,000 experiments):

search_res = pet.search('?type=experiment&assay_term_name=DNase-seq&replicates.library.biosample.donor.organism.scientific_name=Homo%20sapiens', limit=1000)

You can see how these types of queries are constructed by looking at the ENCODE API documentation or by going to one of the ENCODE data pages (such as the Assays page and applying some of the filters on the left side of the page. As you apply the filters, the URL will update with the appropriate search term.

Development

If you want to work on the source code, start by initializing and updating the ENCODE submodule:

git submodule init
git submodule update

Strategy

My overall goal for this package is to provide a nice Python interface for the ENCODE API. I have created a general class EncodeRecord that wraps results from GET requests to the API. My plan is to add parsers for various types of data returned from ENCODE to make the information more useful. For instance, rather than just setting a "culture_harvest_date" as a string, I can convert is to a datetime object that is more useful within Python.

cdeboever3/pyencodetools