The Quilt Python connector uses the Quilt REST API and SQL Alchemy (http://docs.sqlalchemy.org/), if installed, to access and update data sets in Quilt. Quilt tables are available as dictionaries or Pandas (http://pandas.pydata.org/) DataFrames.
The Quilt Python connector is available via PyPI: https://pypi.python.org/pypi/quilt
pip install quilt
To use the Quilt Python connector, add this repository to your PYTHONPATH and import quilt.
Connect to Quilt by creating a Connection object:
import quilt
connection = quilt.Connection(username)
Password: *enter your password*
The connection will contain a list of your Quilt tables:
connection.tables
You can also find tables by searching your own tables and Quilt’s public data sets
connection.search('term')
Get a table by Table id using get_table:
t = connection.get_table(1234)
Using the connection, you can create new tables in Quilt. To create an empty table:
t = connection.create_table(name, description)
To create a table from an input file:
t = connection.create_table(name, description, inputfile=path_to_input_file)
Or, to create a new table from a DataFrame:
t = connection.save_df(df, name, description="table description")
Each Table object has a list of Columns
mytable.columns
After the columns have been fetched, columns are available as table attributes.
mytable.column1
Tables are iterable. To access table data:
for row in mytable:
print row
Search for matching rows in a table by calling search.
for row in mytable.search('foo'):
print row
Sort the table by any column or set of columns. You can set the ordering by passing a string that is the column’s field (name in the database).
mytable.order_by('column1')
You can find column field names with their “.field” attribute:
mytable.order_by(mytable.column1.field)
You can sort by multiple columns by passing a list of fields.
mytable.order_by(['column2', 'column1'])
To sort in descending order, add a “-” in front of the column field name:
mytable.order_by('-column1')
Limit the number of rows returned by calling limit(number_of_rows).
Search, order_by and limit can be combined to return just the data you want to see. For example, to return the top 2 finishers with the name Sally from a table of race results (race_results: [name_000, time_001]), you could write:
for result in race_results.search('Sally').order_by('-time_001').limit(2):
print row
Access a table’s data as a Pandas DataFrame by calling mytable.df()
You can also combine the querying methods above to access particular rows.
race_results.search('Sally').order_by('-time\_001').limit(2).df()
Quilt supports intersect and subtract for tables that store genomic regions. Those operations assume that tables have columns storing: Chromsome, start and end. The function get_bed_cols tries to infer those columns based on column names.
If the guessing fails, or to override the guess, set the chromosome, start, end columns explicitly with set_bed_cols. mytable.set_bed_cols(mytable.chr_001, mytable.start_002, mytable.end_003)
Once the bed columns are set for both tables, they can be intersected and subtracted.
result = tableA.intersect(tableB)
result = tableA.intersect_wao(tableB)
result = tableA.subtract(tableB)
Python 2.7 tests in-progress. Tests run with:
pip install -r requirements.text
pip install pytest
pytest tests