Collection of bash commands and code snippets for Data Science and more.
This repository is not intended as a learning base, it is just a collections of commands and snippets that can be used to refresh your memory.
Basic commands and templates for
- Hadoop FS
- Hive
- Pig
- Spark
- SQOOP
Collection of sample Bash commands
- Simple and general commands
- Data manipulation commands
- Creating isolated environment using a Makefile
- Common python requirements for data science to be installed by pip
- Using python libraries
- Pandas
- Numpy
- Scikit learn
A collection of jupyter notebooks
- Pandas
- Preconditions
- Read-or-persist (template for reading from a remote source and persist data to local folder)
- Template (A template notebook tu jump start with common requirements)
- Uncertainties (Use of uncertainties library to account for error in calculations)
- Setup of environment
- Install components
- Docker Machine
- Docker Swarm
- Docker Compose