Implementation: | Python 2.7 |
---|---|
Status: | Alpha (contract may change) |
Download: | http://pypi.python.org/pypi/bigtempo/ |
Source: | http://github.com/rhlobo/bigtempo/ |
Keywords: | bigdata, time series, temporal processment, temporal analysis, data processment, data analysis, scalable, distributed, data exploration, python |
This is a Python package created to help you build complex hierarchies of processments, each refered as a datasource. The package was originally conceived to handle temporal data and it is typically used as a colleague of pandas - dealing with time series and dataframes - but it is flexible and can easily be extended to support other data models. It handles dependency resolution, provides a tagging system that enables querying operations over datasource sets, and much more.
There are other software packages that focus on lower level aspects of data processing, like pandas, numpy, sympy, theano. This is not a framework to replace these. Instead, it aims to support many of these tools, helping you to stitch many processments together. It provides a decoupled programming model that was built with scalability support in its heart and it takes care of a lot of the workflow management so that you can focus on the data itself.
Bigtempo aims to provide support an wide range of applications - including artificial intelligence systems - working in data pull fashion. Its philosophy is to lazyload things as possible: analysis are retrieved from cache if available, processed otherwise. A datasource serves data through processors that can be used by other datasources (or by you directly) and processors are made to be executed in a distributed fashion, if that is desired.
Keep in mind that the package - although performatic - is in Alpha Stage and, as so, most of its caching and distributed processing capabilities are still in the owen. |
You can get started reading an ipython notebook, and for a better understandment of what can be done, you shall take a peek in the pandas introduction.
If you need more examples, or just feel like checking out how bigtempo can be used in a project, please refer to stockExperiments.
To install, simply:
$ pip install bigtempo
Or, if you absolutely must:
$ easy_install bigtempo
Both the installation methods above should take care of dependencies on its own, automatically.
The pandas library is the only direct dependency the package has in order to be executed. You should visit its page to find out what it depends on. For best results, we recommend installing optional packages as well.
If you want to run the package tests, or enjoy its testing facilities, you'll need:
- mockito >= 0.5.1
In order to run the tests using the command contained in the bin
directory, also install:
nose >= 1.3.0
coverage >= 3.6
pep8 >= 1.4.5
At bin/docker_base you can find a script named setup-ubuntu_14.04_x64.sh that is able to prepare an Ubuntu (Trusty Tahr) machine with pandas and all of its dependencies. It is originally mean to be used to build the project's docker image but it should work on real machines.
To install bigtempo from source you need:
Clone the git repository:
$ git clone https://github.com/rhlobo/bigtempo.git
Get into the project directory:
$ cd bigtempo
Install dependencies (if you are not using virtualenv, it may need super user privileges):
$ pip install -r requirements.txt
Install it:
$ python setup.py install
Alternatively, you can use pip if you want all the dependencies pulled in automatically (the optional -e
option is for installing it in
development mode):
$ pip install -e .
Distributed processing
- Build in process pools
- Integration with celery
- Integration with Apache ZooKeeper and ZeroMQ
Caching
- Smart temporal data caching
Compatibility
- Python 2.7+
If you have any suggestions, bug reports or annoyances please report them to our issue_tracker.
- On the tracker, check for open issues or open a new one to start a discussion around an idea or bug.
- Fork the repository on GitHub to start making your changes.
- Write a test which shows that the bug was fixed or that the feature works as expected.
- Send a pull request and wait until it gets merged and published. Make sure to add yourself to AUTHORS.