/sitta

So It's Time To Analyse

Primary LanguageJupyter NotebookMIT LicenseMIT

sitta

So It's Time To Analyse

Using packages as suggested from

https://blog.dominodatalab.com/video-huge-debate-r-vs-python-data-science/

Collecting Data

  • feather - fast reading/writing
  • ibis - dataframes with database neurtral way
  • paratext - fast csv reading
  • bcolz - compressed columnar data storage. Like SFrames

Visualisation

  • altair - matplotlib replacement, uses grammar of graphics
  • bokeh - interactive visualizations
  • geoplotlib - maps

Advanced and fast data

  • blaze - numpy pandas syntax with any backend, abstracts storage and compute. Eg spark
  • xarray - high end data manipulations. n dimension arrays
  • dask - parallel computation. Dynamic task scheduler (like celery, optimised for interactivity), parallel arrays, dataframes and lists.

Modeling

  • keras - deep learning
  • pymc3 - high end algorithms for modelling