pandalone: A Python repository from silky

git:	$Id$

pandalone: wrapping pandas in trees

Release:	0.0.1-dev.1
Documentation:	https://pandalone.readthedocs.org/
Source:	https://github.com/pandalone/pandalone
PyPI repo:	https://pypi.python.org/pypi/pandalone
Keywords:	utility, library, data, tree, processing, calculation, dependencies, resolution, pandas, dictionaries, maps, lists, scientific, engineering
Copyright:	2015 European Commission (JRC-IET)
License:	EUPL 1.1+

pandalone is a python library for processing hierarchical data (json, hdf5, pandas), for scientific and engineering exploration.

Table of Contents

pandalone: wrapping pandas in trees

Introduction

Overview

An "execution" or a "run" of a calculation is depicted in the following diagram:

    .---------------------.     _____________       .----------------------------.
   ;       DataTree      ;     |             |      ;          DataTree          ;
  ;---------------------;  ==> | <some code> | ==> ;----------------------------;
 ;                     ;       |_____________|    ;                            ;
'---------------------'                         '----------------------------.

The Input & Output Data are instances of :dfn:`data-tree`, trees of strings and numbers, assembled with:

sequences,
dictionaries,
:class:`pandas.DataFrame`,
:class:`pandas.Series`, and
URI-references to other data-trees/paths.

The program runs on Python-2.7+ and Python-3.3+ (preferred) and requires numpy/scipy, pandas and win32 libraries along with their native backends to be installed. If you do not have such an environment already installed, please read :doc:`install` section below for suitable distributions such as WinPython or Anaconda.

Assuming that you have a working python-environment, open a command-shell, (in Windows use :program:`cmd.exe` BUT ensure :program:`python.exe` is in its :envvar:`PATH`), you can try the following commands:

Tip

The commands beginning with $, below, imply a Unix like operating system with a POSIX shell (Linux, OS X). Although the commands are simple and easy to translate in its Windows cmd.exe counterpart, it would be worthwile to install Cygwin to get the same environment on Windows. If you choose to do that, include also the following packages in the Cygwin's installation wizard:

* git, git-completion
* make, zip, unzip, bzip2, dos2unix
* openssh, curl, wget

But do not install/rely on cygwin's outdated python environment.

Install:	$ pip install pandalone ## Use `--pre` if version-string has a build-suffix. Or in case you need the very latest from master branch : $ pip install git+https://github.com/pandalone/pandalone.git See: :doc:`install`
Run:	$ pandalone --version

Install:

$ pip install pandalone                 ## Use `--pre` if version-string has a build-suffix.

Or in case you need the very latest from master branch :

$ pip install git+https://github.com/pandalone/pandalone.git

See: :doc:`install`

Run:

$ pandalone --version

Install

Current version(|version|) runs on Python-2.7+ and Python-3.3+ and requires numpy/scipy, pandas and win32 libraries along with their native backends to be installed.

It has been tested under Windows and Linux and Python-3.3+ is the preferred interpreter, i.e, the Excel interface and desktop-UI runs only with it.

It is distributed on Wheels.

Python installation

Warning

On Windows it is strongly suggested NOT to install the standard CPython distribution, unless:

you have administrative priviledges,
you are an experienced python programmer, so that
you know how to hunt dependencies from PyPi repository and/or the Unofficial Windows Binaries for Python Extension Packages.

As explained above, this project depends on packages with native-backends that require the use of C and Fortran compilers to build from sources. To avoid this hassle, you should choose one of the user-friendly distributions suggested below.

Below is a matrix of the two suggested self-wrapped python distributions for running this program (we excluded here default python included in linux). Both distributions:

are free (as of freedom),
do not require admin-rights for installation in Windows, and
have been tested to run successfully this program (also tested on default linux distros).

Distributions	WinPython	Anaconda
Platform	Windows	Windows, Mac OS, Linux
Ease of Installation	Fair (requires fiddling with the :envvar:`PATH` and the Registry after install)	Anaconda: Easy MiniConda: Moderate
Ease of Use	Easy	Moderate (should use :command:`conda` and/or :command:`pip` depending on whether a package contains native libraries
# of Packages	Only what's included in the downloaded-archive	Many 3rd-party packages uploaded by users
Notes	After installation, see :doc:`faq` for: Registering WinPython installation Adding your installation in :envvar:`PATH`	Check also the lighter miniconda. For installing native-dependencies with :command:`conda` see files: :file:`requirements/miniconda.txt` :file:`.travis.yaml`
	Check also installation instructions from the pandas site.

Package installation

Before installing it, make sure that there are no older versions left over on the python installation you are using. To cleanly uninstall it, run this command until you cannot find any project installed:

$ pip uninstall pandalone                   ## Use `pip3` if both python-2 & 3 are in PATH.

You can install the project directly from the PyPi repo the "standard" way, by typing the :command:`pip` in the console:

$ pip install pandalone

If you want to install a pre-release version (the version-string is not plain numbers, but ends with alpha, beta.2 or something else), use additionally :option:`--pre`.

$ pip install pandalone

Also you can install the very latest version straight from the sources:
```
$ pip install git+git://github.com/pandalone/pandalone.git  --pre
```
If you want to upgrade an existing instalation along with all its dependencies, add also :option:`--upgrade` (or :option:`-U` equivalently), but then the build might take some considerable time to finish. Also there is the possibility the upgraded libraries might break existing programs(!) so use it with caution, or from within a virtualenv (isolated Python environment).
To install it for different Python environments, repeat the procedure using the appropriate :program:`python.exe` interpreter for each environment.
Tip

To debug installation problems, you can export a non-empty :envvar:`DISTUTILS_DEBUG` and distutils will print detailed information about what it is doing and/or print the whole command line when an external program (like a C compiler) fails.

After installation, it is important that you check which version is visible in your :envvar:`PATH`:

$ pandalone --version
0.0.1-dev.1

To install for different Python versions, repeat the procedure for every required version.

Older versions

To install an older released version issue the console command:

$ pip install pandalone=0.0.1                   ## Use `--pre` if version-string has a build-suffix.

or alternatively straight from the sources:

$ pip install git+https://github.com/pandalone/pandalone.git@v0.0.9-alpha.3.1  --pre

Of course you can substitute v0.0.9-alpha.3.1 with any slug from "commits", "branches" or "releases" that you will find on project's github-repo).

Note

If you have another version already installed, you have to use :option:`--ignore-installed` (or :option:`-I`). For using the specific version, check this (untested) stackoverflow question.

You can install each version in a separate virtualenv (isolated Python environment) and shy away from all this. Check

Installing sources

If you download the sources you have more options for installation. There are various methods to get hold of them:

Download the source distribution from PyPi repo.
Download a release-snapshot from github

Clone the git-repository at github.

Assuming you have a working installation of git you can fetch and install the latest version of the project with the following series of commands:

$ git clone "https://github.com/pandalone/pandalone.git" pandalone.git
$ cd pandalone.git
$ python setup.py install                                 ## Use `python3` if both python-2 & 3 installed.

When working with sources, you need to have installed all libraries that the project depends on:

$ pip install -r requirements/execution.txt .

The previous command installs a "snapshot" of the project as it is found in the sources. If you wish to link the project's sources with your python environment, install the project in development mode:

$ python setup.py develop

Note

This last command installs any missing dependencies inside the project-folder.

Project files and folders

The files and folders of the project are listed below:

+--pandalone/       ## (package) The python-code of the calculator
+--tests/           ## (package) Test-cases
+--docs/            ## Documentation folder
+--setup.py         ## (script) The entry point for `setuptools`, installing, testing, etc
+--requirements/    ## (txt-files) Various pip-dependencies for tools.
+--README.rst
+--CHANGES.rst
+--LICENSE.txt

Usage

Cmd-line usage

Warning

Not implemented in yet.

The command-line usage below requires the Python environment to be installed, and provides for executing an experiment directly from the OS's shell (i.e. :program:`cmd` in windows or :program:`bash` in POSIX), and in a single command.

[TBD]

GUI usage

Attention!

Desktop UI requires Python 3!

For a quick-'n-dirty method to explore the structure of the data-tree and run an experiment, just run:

$ pandalone gui

Excel usage

Attention!

Excel-integration requires Python-3 and Windows or OS X!

In Windows and OS X you may utilize the excellent xlwings library to use Excel files for providing input and output to the experiment.

To create the necessary template-files in your current-directory you should enter:

$ pandalone excel

You could type instead :samp:`pandalone excel {file_path}` to specify a different destination path.

[TBD]

Python usage

Example python :abbr:`REPL (Read-Eval-Print Loop)` example-commands are given below that setup and run an experiment.

First run :command:`python` or :command:`ipython` and try to import the project to check its version:

>>> import pandalone

>>> pandalone.__version__           ## Check version once more.
'0.0.1-dev.1'

>>> pandalone.__file__              ## To check where it was installed.         # doctest: +SKIP
/usr/local/lib/site-package/pandalone-...

If everything works, create the :term:`data-tree` to hold the input-data (strings and numbers). You assemble data-tree by the use of:

sequences,
dictionaries,
:class:`pandas.DataFrame`,
:class:`pandas.Series`, and
URI-references to other data-trees.

[TBD]

Getting Involved

This project is hosted in github. To provide feedback about bugs and errors or questions and requests for enhancements, use github's Issue-tracker.

Sources & Dependencies

To get involved with development, you need a POSIX environment to fully build it (Linux, OSX or Cygwin on Windows).

First you need to download the latest sources:

$ git clone https://github.com/pandalone/pandalone.git pandalone.git
$ cd pandalone.git

Virtualenv

You may choose to work in a virtualenv (isolated Python environment), to install dependency libraries isolated from system's ones, and/or without admin-rights (this is recommended for Linux/Mac OS).

Attention!

If you decide to reuse stystem-installed packages using :option:`--system-site-packages` with virtualenv <= 1.11.6 (to avoid, for instance, having to reinstall numpy and pandas that require native-libraries) you may be bitten by bug #461 which prevents you from upgrading any of the pre-installed packages with :command:`pip`.

Liclipse IDE

Within the sources there are two sample files for the comprehensive LiClipse IDE:

:file:`eclipse.project`
:file:`eclipse.pydevproject`

Remove the eclipse prefix, (but leave the dot(.)) and import it as "existing project" from Eclipse's File menu.

Another issue is caused due to the fact that LiClipse contains its own implementation of Git, EGit, which badly interacts with unix symbolic-links, such as the :file:`docs/docs`, and it detects working-directory changes even after a fresh checkout. To workaround this, Right-click on the above file :menuselection:`Properties --> Team --> Advanced --> Assume Unchanged`

Then you can install all project's dependencies in `development mode using the :file:`setup.py` script:

$ python setup.py --help                           ## Get help for this script.
Common commands: (see '--help-commands' for more)

  setup.py build      will build the package underneath 'build/'
  setup.py install    will install the package

Global options:
  --verbose (-v)      run verbosely (default)
  --quiet (-q)        run quietly (turns verbosity off)
  --dry-run (-n)      don't actually do anything
...

$ python setup.py develop                           ## Also installs dependencies into project's folder.
$ python setup.py build                             ## Check that the project indeed builds ok.

You should now run the test-cases (see :doc:`metrics`) to check that the sources are in good shape:

$ python setup.py test

Note

The above commands installed the dependencies inside the project folder and for the virtual-environment. That is why all build and testing actions have to go through :samp:`python setup.py {some_cmd}`.

If you are dealing with installation problems and/or you want to permantly install dependant packages, you have to deactivate the virtual-environment and start installing them into your base python environment:

$ deactivate
$ python setup.py develop

or even try the more permanent installation-mode:

$ python setup.py install                # May require admin-rights

These are the knowngly related python projects:

OpenMDAO:

It has influenced pandalone's design. It is planned to interoperate by converting to and from it's data-types. But it works on python-2 only and its architecture needs attending from programmers (no setup.py, no official test-cases).

PyDSTool:

It does not overlap, since it does not cover IO and dependencies of data. Also planned to interoperate with it (as soon as we have a better grasp of it :-). It has some issues with the documentation, but they are working on it.

xray:

pandas for higher dimensions; should in principle work "xray" data-trees.

netCDF4:

Hierarchical file-data-format similar to hdf5.

hdf5:

Hierarchical file-data-format, supported natively by pandas.

Glossary

.. glossary::

    data-tree
        The *container* of data that the gear-shift calculator consumes and produces.
        It is implemented by :class:`pandalone.pandata.Pandel` as a mergeable stack of
        :term:`JSON-schema` abiding trees of strings and numbers, formed with sequences, dictionaries,
        :mod:`pandas`-instances and URI-references.

    JSON-schema
        The `JSON schema <http://json-schema.org/>`_ is an `IETF draft <http://tools.ietf.org/html/draft-zyp-json-schema-03>`_
        that provides a *contract* for what JSON-data is required for a given application and how to interact
        with it.  JSON Schema is intended to define validation, documentation, hyperlink navigation, and
        interaction control of JSON data.
        You can learn more about it from this `excellent guide <http://spacetelescope.github.io/understanding-json-schema/>`_,
        and experiment with this `on-line validator <http://www.jsonschema.net/>`_.

    JSON-pointer
        JSON Pointer(:rfc:`6901`) defines a string syntax for identifying a specific value within
        a JavaScript Object Notation (JSON) document. It aims to serve the same purpose as *XPath* from the XML world,
        but it is much simpler.

silky/pandalone

pandalone: wrapping pandas in trees

Introduction

Overview

Quick-start

Install

Python installation

Package installation

Older versions

Installing sources

Project files and folders

Usage

Cmd-line usage

GUI usage

Excel usage

Python usage

Getting Involved

Sources & Dependencies

Development procedure

Authors

Design

FAQ

Why another XXX? What about YYY?

Glossary