git: |
---|
Release: | 0.0.1-dev.1 |
---|---|
Documentation: | https://pandalone.readthedocs.org/ |
Source: | https://github.com/pandalone/pandalone |
PyPI repo: | https://pypi.python.org/pypi/pandalone |
Keywords: | utility, library, data, tree, processing, calculation, dependencies, resolution, pandas, dictionaries, maps, lists, scientific, engineering |
Copyright: | 2015 European Commission (JRC-IET) |
License: | EUPL 1.1+ |
pandalone is a python library for processing hierarchical data (json, hdf5, pandas), for scientific and engineering exploration.
Table of Contents
An "execution" or a "run" of a calculation is depicted in the following diagram:
.---------------------. _____________ .----------------------------. ; DataTree ; | | ; DataTree ; ;---------------------; ==> | <some code> | ==> ;----------------------------; ; ; |_____________| ; ; '---------------------' '----------------------------.
The Input & Output Data are instances of :dfn:`data-tree`, trees of strings and numbers, assembled with:
- sequences,
- dictionaries,
- :class:`pandas.DataFrame`,
- :class:`pandas.Series`, and
- URI-references to other data-trees/paths.
Note
The program runs on Python-2.7+ and Python-3.3+ (preferred) and requires numpy/scipy, pandas and win32 libraries along with their native backends to be installed. If you do not have such an environment already installed, please read :doc:`install` section below for suitable distributions such as WinPython or Anaconda.
Assuming that you have a working python-environment, open a command-shell, (in Windows use :program:`cmd.exe` BUT ensure :program:`python.exe` is in its :envvar:`PATH`), you can try the following commands:
Tip
The commands beginning with $
, below, imply a Unix like operating system with a POSIX shell
(Linux, OS X). Although the commands are simple and easy to translate in its Windows cmd.exe
counterpart,
it would be worthwile to install Cygwin to get the same environment on Windows.
If you choose to do that, include also the following packages in the Cygwin's installation wizard:
* git, git-completion * make, zip, unzip, bzip2, dos2unix * openssh, curl, wget
But do not install/rely on cygwin's outdated python environment.
Install: |
$ pip install pandalone ## Use `--pre` if version-string has a build-suffix. Or in case you need the very latest from master branch : $ pip install git+https://github.com/pandalone/pandalone.git See: :doc:`install` |
---|---|
Run: |
$ pandalone --version |
Current version(|version|) runs on Python-2.7+ and Python-3.3+ and requires numpy/scipy, pandas and win32 libraries along with their native backends to be installed.
It has been tested under Windows and Linux and Python-3.3+ is the preferred interpreter, i.e, the Excel interface and desktop-UI runs only with it.
It is distributed on Wheels.
Warning
On Windows it is strongly suggested NOT to install the standard CPython distribution, unless:
- you have administrative priviledges,
- you are an experienced python programmer, so that
- you know how to hunt dependencies from PyPi repository and/or the Unofficial Windows Binaries for Python Extension Packages.
As explained above, this project depends on packages with native-backends that require the use of C and Fortran compilers to build from sources. To avoid this hassle, you should choose one of the user-friendly distributions suggested below.
Below is a matrix of the two suggested self-wrapped python distributions for running this program (we excluded here default python included in linux). Both distributions:
- are free (as of freedom),
- do not require admin-rights for installation in Windows, and
- have been tested to run successfully this program (also tested on default linux distros).
Distributions | WinPython | Anaconda |
---|---|---|
Platform | Windows | Windows, Mac OS, Linux |
Ease of Installation |
Fair (requires fiddling with the :envvar:`PATH` and the Registry after install) |
|
Ease of Use | Easy | Moderate (should use :command:`conda` and/or :command:`pip` depending on whether a package contains native libraries |
# of Packages | Only what's included in the downloaded-archive |
Many 3rd-party packages uploaded by users |
Notes | After installation, see :doc:`faq` for:
|
|
Check also installation instructions from the pandas site. |
Before installing it, make sure that there are no older versions left over on the python installation you are using. To cleanly uninstall it, run this command until you cannot find any project installed:
$ pip uninstall pandalone ## Use `pip3` if both python-2 & 3 are in PATH.
You can install the project directly from the PyPi repo the "standard" way, by typing the :command:`pip` in the console:
$ pip install pandalone
- If you want to install a pre-release version (the version-string is not plain numbers, but
ends with
alpha
,beta.2
or something else), use additionally :option:`--pre`.
$ pip install pandalone
Also you can install the very latest version straight from the sources:
$ pip install git+git://github.com/pandalone/pandalone.git --pre
If you want to upgrade an existing instalation along with all its dependencies, add also :option:`--upgrade` (or :option:`-U` equivalently), but then the build might take some considerable time to finish. Also there is the possibility the upgraded libraries might break existing programs(!) so use it with caution, or from within a virtualenv (isolated Python environment).
To install it for different Python environments, repeat the procedure using the appropriate :program:`python.exe` interpreter for each environment.
Tip
To debug installation problems, you can export a non-empty :envvar:`DISTUTILS_DEBUG` and distutils will print detailed information about what it is doing and/or print the whole command line when an external program (like a C compiler) fails.
After installation, it is important that you check which version is visible in your :envvar:`PATH`:
$ pandalone --version
0.0.1-dev.1
To install for different Python versions, repeat the procedure for every required version.
To install an older released version issue the console command:
$ pip install pandalone=0.0.1 ## Use `--pre` if version-string has a build-suffix.
or alternatively straight from the sources:
$ pip install git+https://github.com/pandalone/pandalone.git@v0.0.9-alpha.3.1 --pre
Of course you can substitute v0.0.9-alpha.3.1 with any slug from "commits", "branches" or "releases" that you will find on project's github-repo).
Note
If you have another version already installed, you have to use :option:`--ignore-installed` (or :option:`-I`). For using the specific version, check this (untested) stackoverflow question.
You can install each version in a separate virtualenv (isolated Python environment) and shy away from all this. Check
If you download the sources you have more options for installation. There are various methods to get hold of them:
-
Download the source distribution from PyPi repo.
-
Download a release-snapshot from github
-
Clone the git-repository at github.
Assuming you have a working installation of git you can fetch and install the latest version of the project with the following series of commands:
$ git clone "https://github.com/pandalone/pandalone.git" pandalone.git $ cd pandalone.git $ python setup.py install ## Use `python3` if both python-2 & 3 installed.
When working with sources, you need to have installed all libraries that the project depends on:
$ pip install -r requirements/execution.txt .
The previous command installs a "snapshot" of the project as it is found in the sources. If you wish to link the project's sources with your python environment, install the project in development mode:
$ python setup.py develop
Note
This last command installs any missing dependencies inside the project-folder.
The files and folders of the project are listed below:
+--pandalone/ ## (package) The python-code of the calculator +--tests/ ## (package) Test-cases +--docs/ ## Documentation folder +--setup.py ## (script) The entry point for `setuptools`, installing, testing, etc +--requirements/ ## (txt-files) Various pip-dependencies for tools. +--README.rst +--CHANGES.rst +--LICENSE.txt
Warning
Not implemented in yet.
The command-line usage below requires the Python environment to be installed, and provides for executing an experiment directly from the OS's shell (i.e. :program:`cmd` in windows or :program:`bash` in POSIX), and in a single command.
[TBD]
Attention!
Desktop UI requires Python 3!
For a quick-'n-dirty method to explore the structure of the data-tree and run an experiment, just run:
$ pandalone gui
Attention!
Excel-integration requires Python-3 and Windows or OS X!
In Windows and OS X you may utilize the excellent xlwings library to use Excel files for providing input and output to the experiment.
To create the necessary template-files in your current-directory you should enter:
$ pandalone excel
You could type instead :samp:`pandalone excel {file_path}` to specify a different destination path.
[TBD]
Example python :abbr:`REPL (Read-Eval-Print Loop)` example-commands are given below that setup and run an experiment.
First run :command:`python` or :command:`ipython` and try to import the project to check its version:
>>> import pandalone
>>> pandalone.__version__ ## Check version once more.
'0.0.1-dev.1'
>>> pandalone.__file__ ## To check where it was installed. # doctest: +SKIP
/usr/local/lib/site-package/pandalone-...
If everything works, create the :term:`data-tree` to hold the input-data (strings and numbers). You assemble data-tree by the use of:
- sequences,
- dictionaries,
- :class:`pandas.DataFrame`,
- :class:`pandas.Series`, and
- URI-references to other data-trees.
[TBD]
This project is hosted in github. To provide feedback about bugs and errors or questions and requests for enhancements, use github's Issue-tracker.
To get involved with development, you need a POSIX environment to fully build it (Linux, OSX or Cygwin on Windows).
First you need to download the latest sources:
$ git clone https://github.com/pandalone/pandalone.git pandalone.git
$ cd pandalone.git
Virtualenv
You may choose to work in a virtualenv (isolated Python environment), to install dependency libraries isolated from system's ones, and/or without admin-rights (this is recommended for Linux/Mac OS).
Attention!
If you decide to reuse stystem-installed packages using :option:`--system-site-packages`
with virtualenv <= 1.11.6
(to avoid, for instance, having to reinstall numpy and pandas that require native-libraries)
you may be bitten by bug #461 which
prevents you from upgrading any of the pre-installed packages with :command:`pip`.
Liclipse IDE
Within the sources there are two sample files for the comprehensive LiClipse IDE:
Remove the eclipse prefix, (but leave the dot(.)) and import it as "existing project" from Eclipse's File menu.
Another issue is caused due to the fact that LiClipse contains its own implementation of Git, EGit, which badly interacts with unix symbolic-links, such as the :file:`docs/docs`, and it detects working-directory changes even after a fresh checkout. To workaround this, Right-click on the above file :menuselection:`Properties --> Team --> Advanced --> Assume Unchanged`
Then you can install all project's dependencies in `development mode using the :file:`setup.py` script:
$ python setup.py --help ## Get help for this script.
Common commands: (see '--help-commands' for more)
setup.py build will build the package underneath 'build/'
setup.py install will install the package
Global options:
--verbose (-v) run verbosely (default)
--quiet (-q) run quietly (turns verbosity off)
--dry-run (-n) don't actually do anything
...
$ python setup.py develop ## Also installs dependencies into project's folder.
$ python setup.py build ## Check that the project indeed builds ok.
You should now run the test-cases (see :doc:`metrics`) to check that the sources are in good shape:
$ python setup.py test
Note
The above commands installed the dependencies inside the project folder and for the virtual-environment. That is why all build and testing actions have to go through :samp:`python setup.py {some_cmd}`.
If you are dealing with installation problems and/or you want to permantly install dependant packages, you have to deactivate the virtual-environment and start installing them into your base python environment:
$ deactivate
$ python setup.py develop
or even try the more permanent installation-mode:
$ python setup.py install # May require admin-rights
See architecture live-document.
These are the knowngly related python projects:
- OpenMDAO:
It has influenced pandalone's design. It is planned to interoperate by converting to and from it's data-types. But it works on python-2 only and its architecture needs attending from programmers (no setup.py, no official test-cases).
- PyDSTool:
It does not overlap, since it does not cover IO and dependencies of data. Also planned to interoperate with it (as soon as we have a better grasp of it :-). It has some issues with the documentation, but they are working on it.
- xray:
pandas for higher dimensions; should in principle work "xray" data-trees.
- netCDF4:
Hierarchical file-data-format similar to hdf5.
- hdf5:
Hierarchical file-data-format, supported natively by pandas.
.. glossary:: data-tree The *container* of data that the gear-shift calculator consumes and produces. It is implemented by :class:`pandalone.pandata.Pandel` as a mergeable stack of :term:`JSON-schema` abiding trees of strings and numbers, formed with sequences, dictionaries, :mod:`pandas`-instances and URI-references. JSON-schema The `JSON schema <http://json-schema.org/>`_ is an `IETF draft <http://tools.ietf.org/html/draft-zyp-json-schema-03>`_ that provides a *contract* for what JSON-data is required for a given application and how to interact with it. JSON Schema is intended to define validation, documentation, hyperlink navigation, and interaction control of JSON data. You can learn more about it from this `excellent guide <http://spacetelescope.github.io/understanding-json-schema/>`_, and experiment with this `on-line validator <http://www.jsonschema.net/>`_. JSON-pointer JSON Pointer(:rfc:`6901`) defines a string syntax for identifying a specific value within a JavaScript Object Notation (JSON) document. It aims to serve the same purpose as *XPath* from the XML world, but it is much simpler.