/seqslab-connector

SeqsLab cluster and data lakehouse connector for Python

Primary LanguagePythonApache License 2.0Apache-2.0

seqslab-connector

The SeqsLab Connector for Python based on pyhive allows you to create a Python DB API connection to Atgenomix SeqsLab interactive jobs (clusters) and develop Python-based workflow applications. It is a Hive-Thrift-based client with no dependencies on ODBC or JDBC. It also provides a SQLAlchemy dialect and an Apache Superset database engine spec for use with tools to execute DQL.

You are welcome to file an issue for general use cases. You can also contact Atgenomix Support here.

Requirements

Python 3.7 or above is required.

Installation

Install using pip.

pip install seqslab-connector

For Apache Superset integration install with

pip install seqslab-connector[superset]

Usage

DB-API

from seqslab import hive

conn = hive.connect(database='run_name', http_path='job_run_id', username='user', password='pass', host='job_cluster_host')
cursor = conn.cursor()
cursor.execute('SHOW TABLES')
print(cursor.fetchall())
cursor.execute('SELECT * FROM my_workflow_table_name LIMIT 10')
print(cursor.fetchall())
cursor.close()

SQLAlchemy

from sqlalchemy.engine import create_engine

engine = create_engine('seqslab+hive://user:pass@job_cluster_host/run_name?http_path=job_run_id')

Apache Superset

Connecting to Databases

Documentation

For the latest documentation, see SeqsLab.