http://www.youtube.com/watch?v=rXj5nayS7Yg
http://tuulos.github.io/sf-python-meetup-sep-2013
http://www.meetup.com/sfpython/events/137674842/
The mainstream paradigms for processing large amounts of data, such as MapReduce and NoSQL, are based on distributed computing and massive horizontal scalability. Since the publication of the original MapReduce paper by Google in 2004, the performance of a single high-end server has grown by the factor of 50.
In this talk, we show how AdRoll uses Python to squeeze the last bit of performance out of a single high-end server, for the purpose of interactive analysis of terabyte-scale datasets. This feat is made possible by Numba, a new NumPy aware dynamic Python compiler based on LLVM. Thanks to Python, the system can provide a very expressive and developer-friendly API, while keeping the complexity of implementation in check. The talk should be relevant to anyone interested in Big Data and High-Performance Computing using Python.