data_analysis_pipeline_eg: A Python repository from ttimbers

Building a Data Analysis pipeline tutorial

This example data analysis project analyzes the word count for all words in 4 novels. It reports the top 10 most occurring words in each book in a report.

Usage:

There are two suggested ways to run this analysis:

1. Using Docker

note - the instructions in this section also depends on running this in a unix shell (e.g., terminal or Git Bash), if you are using Windows Command Prompt, replace /$(pwd) with PATH_ON_YOUR_COMPUTER.

Install Docker
Download/clone this repository
Use the command line to navigate to the root of this downloaded/cloned repo
Type the following:

docker-compose run --rm analysis-env make -C /home/rstudio/data_analysis_eg all

2. After installing all dependencies (does not depend on Docker)

Clone this repo, and using the command line, navigate to the root of this project.
To run the analysis, type the following commands:

make all

To reset/undo the analysis, type the following commands:

make clean

Depenedencies

R & R libraries:
- rmarkdown==2.0
- knitr==1.26
- here==0.1
- cowsay==0.7.0
Python & Python libraries:
- matplotlib==3.1.1
- pandas==0.25.1
- numpy==1.17.2R
GNU make 4.2.1

ttimbers/data_analysis_pipeline_eg

Building a Data Analysis pipeline tutorial

Usage:

1. Using Docker

2. After installing all dependencies (does not depend on Docker)

Depenedencies