This course is intended to get you up to speed with modern programming techniques and statistical concepts related to Data Science and Big Data.
While these are fashionable buzzwords nowadays there is some very interesting and applicable concepts behind them.
While we use bioinformatics as an example here, the content is readily applicable to most other data science fields.
This course is mostly Python based, but it uses Javascript (browser) for visualization. We also discuss interfacing with R and speed optimizations using Cython.
You can look at this course from 3 different perspectives:
- As a hands-on example from where you can take very practical ideas to apply directly to your work
- As an advanced course in the Python ecosystem of libraries and frameworks for data science. And a little bit of the Javascript/browser side.
- As a presentation of the concepts behind advanced programming techniques for data science (data management, asynchronous programming, map-reduce frameworks, high-performance computing, ...)
There is a site for the course, which you are encouraged to visit.
The code is made avaiable under the GNU Affero General Public License. This includes the notebooks.
The documentation in the docs directory has another license: The GNU Free Documentation License