/aliyun-odps-python-sdk

ODPS Python SDK and data analysis framework

Primary LanguagePythonApache License 2.0Apache-2.0

ODPS Python SDK and data analysis framework

PyPI version Docs License Implementation



-----------------

Elegent way to access ODPS API. Documentation

Installation

The quick way:

pip install pyodps[full]

If you don't need to use Jupyter, just type

pip install pyodps

The dependencies will be installed automatically.

Or from source code (not recommended for production use):

$ virtualenv pyodps_env
$ source pyodps_env/bin/activate
$ pip install git+https://github.com/aliyun/aliyun-odps-python-sdk.git

Dependencies

  • Python (>=2.7), including Python 3+, pypy, Python 3.7 recommended
  • setuptools (>=3.0)

Run Tests

  • install pytest
  • copy conf/test.conf.template to odps/tests/test.conf, and fill it with your account
  • run pytest odps

Usage

>>> import os
>>> from odps import ODPS
>>> # Make sure environment variable ALIBABA_CLOUD_ACCESS_KEY_ID already set to Access Key ID of user
>>> # while environment variable ALIBABA_CLOUD_ACCESS_KEY_SECRET set to Access Key Secret of user.
>>> # Not recommended to hardcode Access Key ID or Access Key Secret in your code.
>>> o = ODPS(
>>>     os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
>>>     os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
>>>     project='**your-project**',
>>>     endpoint='**your-endpoint**',
>>> )
>>> dual = o.get_table('dual')
>>> dual.name
'dual'
>>> dual.table_schema
odps.Schema {
  c_int_a                 bigint
  c_int_b                 bigint
  c_double_a              double
  c_double_b              double
  c_string_a              string
  c_string_b              string
  c_bool_a                boolean
  c_bool_b                boolean
  c_datetime_a            datetime
  c_datetime_b            datetime
}
>>> dual.creation_time
datetime.datetime(2014, 6, 6, 13, 28, 24)
>>> dual.is_virtual_view
False
>>> dual.size
448
>>> dual.table_schema.columns
[<column c_int_a, type bigint>,
 <column c_int_b, type bigint>,
 <column c_double_a, type double>,
 <column c_double_b, type double>,
 <column c_string_a, type string>,
 <column c_string_b, type string>,
 <column c_bool_a, type boolean>,
 <column c_bool_b, type boolean>,
 <column c_datetime_a, type datetime>,
 <column c_datetime_b, type datetime>]

DataFrame API

>>> from odps.df import DataFrame
>>> df = DataFrame(o.get_table('pyodps_iris'))
>>> df.dtypes
odps.Schema {
  sepallength           float64
  sepalwidth            float64
  petallength           float64
  petalwidth            float64
  name                  string
}
>>> df.head(5)
|==========================================|   1 /  1  (100.00%)         0s
   sepallength  sepalwidth  petallength  petalwidth         name
0          5.1         3.5          1.4         0.2  Iris-setosa
1          4.9         3.0          1.4         0.2  Iris-setosa
2          4.7         3.2          1.3         0.2  Iris-setosa
3          4.6         3.1          1.5         0.2  Iris-setosa
4          5.0         3.6          1.4         0.2  Iris-setosa
>>> df[df.sepalwidth > 3]['name', 'sepalwidth'].head(5)
|==========================================|   1 /  1  (100.00%)        12s
          name  sepalwidth
0  Iris-setosa         3.5
1  Iris-setosa         3.2
2  Iris-setosa         3.1
3  Iris-setosa         3.6
4  Iris-setosa         3.9

Command-line and IPython enhancement

In [1]: %load_ext odps

In [2]: %enter
Out[2]: <odps.inter.Room at 0x10fe0e450>

In [3]: %sql select * from pyodps_iris limit 5
|==========================================|   1 /  1  (100.00%)         2s
Out[3]:
   sepallength  sepalwidth  petallength  petalwidth         name
0          5.1         3.5          1.4         0.2  Iris-setosa
1          4.9         3.0          1.4         0.2  Iris-setosa
2          4.7         3.2          1.3         0.2  Iris-setosa
3          4.6         3.1          1.5         0.2  Iris-setosa
4          5.0         3.6          1.4         0.2  Iris-setosa

Python UDF Debugging Tool

#file: plus.py
from odps.udf import annotate

@annotate('bigint,bigint->bigint')
class Plus(object):
    def evaluate(self, a, b):
        return a + b
$ cat plus.input
1,1
3,2
$ pyou plus.Plus < plus.input
2
5

Contributing

For a development install, clone the repository and then install from source:

git clone https://github.com/aliyun/aliyun-odps-python-sdk.git
cd pyodps
pip install -r requirements.txt -e .

If you need to modify the frontend code, you need to install nodejs/npm. To build and install your frontend code, use

python setup.py build_js
python setup.py install_js

License

Licensed under the Apache License 2.0