-
Create an Example Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. (25 points)
-
Extend an Existing Example Using one of your classmate’s examples (as created above), extend his or her example with additional annotated code. (15 points)
You should fork the provided repository and then clone it locally if you wish. Once you have code to submit, you should make a pull request on the shared repository. Minimally, you should submit .Rmd
files. Ideally, you should also submit an .md
file and update the README.md file with your example.
If you are going to use RStudio as your version control software, make sure to add *.Rproj
and .gitignore
to your .gitignore file before you make any commits. Otherwise you run the risk of trying to push that to the master repository in a pull request.
After you’ve read each part of the assignment, please submit your GitHub handle name in the submission link provided in the Major Assignments folder! This will let your instructor know that your work is ready to be graded.
You should complete both parts of the assignment and make your submissions on the schedule specified in our course syllabus
- GitHub repository: https://github.com/acatlin/SPRING2020TIDYVERSE
- FiveThirtyEight.com datasets: https://data.fivethirtyeight.com/
- Kaggle datasets: https://www.kaggle.com/datasets
-
https://acatlin.github.io/SPRING2020TIDYVERSE/forcats_makes_plots_better.html - how to use capabilities of forcats package to improve your plots! Andy Catlin and Ait Elmouden Abdellah
-
http://rpubs.com/christianthieme/589762 (RMD file: PurrrMap-LoseTheForLoop.rmd) - Using purrr::map() and purrr::pmap() instead of for loops in R - By Christian Thieme
dbplyr is great and simple to use backend solution for working with data that is stored on a SQL server. dbplyr follows the same grammar and formatting as dplyr, but translates R code into a SQL query to directly access the SQL database.
From the forcats package, this function allows you to quickly group levels of a categorical variable if they are above/below a certain count or proportion.
Author: Ken Popkin This vignette loads the Lubridate package and performs a few date transformations. Ken Popkin Lubridate Extension Author: Christian Thieme - RPubs Link: https://rpubs.com/christianthieme/597573
Author: Devin Teran
Functions added for extension:
- str_sub()
- str_subset()
- str_detect()
- str_count()
- str_extract & str_extract_all()
- str_remove
- str_replace RPubs Link: https://rpubs.com/MsQCSNYC/592259
filename = TidyVerse_Devin_Teran.Rmd added: str_view() str_wrap()
Author: Laylah Quinones Extended By: Angel Claudio
ggplot2 is a ubiquitous library that allows you to easily visualize data and explain patterns in data to people who are not nessesarliy familiar with the technical aspect of data analysis. Visualizations made with ggplot2 are easy to understand and construct thanks to an API that allows visualizations to be "built" via layering of graphics and other visual elements.
This file should help you understand how to export a table from a website, gather the data into a tall format, and plot the variables of interest into several plots for easy comparison
Author: Angel Claudio
This vignette is meant to teach you how to use the unite and pivot_wider function from the tidyr package.
A vignette describing the progressive use of as_tibble(), dplyr functions select, arrange, filter and a gpplot to visualizae data. Author: Neil Shah Added a section about purrr and how it adds to R's functional programming capabilities Extend: Justin Hsi
A vignette describing the prepration of an online table for ggplot and facet_wrap Author: Thomas Hill
Instructions:
Please follow along with the steps I took to prepare date for ggplot exploration. I based this off of a previous project and added some annotations about the functions I called.
Using titanic
dataset from Kaggle
, clean and manipulate data using various functions of tidyvesrse package. (Subhalaxmi Rout)
An Example using TidyVerse packages - ggplot2 and dplyr and using tesla-stock-data-from-2010-to-2020 Data set from Kaggle
Author: Vinayak Kamath
ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
ggplot2 geom_line to show the stock price movement over the years for Tesla shares.
dplyr filter helps in filtering of data based on one or more conditions.
dplyr filter to show the days when the stock price for Tesla moved by over 15% (profit or loss) in one day.
dplyr group by and summarise helps in getting aggregated data from the given data set for one or more columns.
dplyr group by and summarise to show the yearly minimum and maximum stock price close and arranging it in descending order of movement in a year.
Author: Ken Popkin Extended this vignette to also include how to create a barplot using ggplot2
Gabriel Abreu
Abdellah Ait Elmouden
Author: Samuel Bellows
calling basic geometries, calling aes and when arguments should go inside/outside aes, facet grid, and basic plot customization. Also includes some use of dplyr and lubridate.
Nilsa Bermudez
Vinayak Kamath
Showcase a few more styles and customizations in GGPlot. Themes, guides, manual color fills, and labels.
Author: Adam Gersowitz
This vignette looks at varisous ways to display visual data using ggplot2 and specifically geom_polygon.
This can be very useful when trying to convey information that has a physical correlation to people who aren't data scientists. For example population data on a map based on county or a blueprint of a building that shows where certain problems are occurring with certain tenants.
In the original vignette by Philip he demonstrated 8 string functions: Detect, Count, Subset, Locate, Extract, MAtch, Replace, Split. In my extension I usede the same format that Philip was using and provided examples fo 5 more string functions: Length, Upper, Trim, Truncate, Sub.
Extended this assignment to include more tidyverse-style data loading and manipulation. Also added more ggplot2 plot building/function writing.
Michael Munguia
Author: Michael Munguia
This vignette showcases some of the text manipulation capacities of the stringr library using the context of restructuring the names of America's mayors.
-
Create an Example Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. (25 points)
-
Extend an Existing Example Using one of your classmate’s examples (as created above), extend his or her example with additional annotated code. (15 points)
You should fork the provided repository and then clone it locally if you wish. Once you have code to submit, you should make a pull request on the shared repository. Minimally, you should submit .Rmd
files. Ideally, you should also submit an .md
file and update the README.md file with your example.
If you are going to use RStudio as your version control software, make sure to add *.Rproj
and .gitignore
to your .gitignore file before you make any commits. Otherwise you run the risk of trying to push that to the master repository in a pull request.
After you’ve read each part of the assignment, please submit your GitHub handle name in the submission link provided in the Major Assignments folder! This will let your instructor know that your work is ready to be graded.
You should complete both parts of the assignment and make your submissions on the schedule specified in our ourse syllabus
- GitHub repository: https://github.com/acatlin/SPRING2020TIDYVERSE
- FiveThirtyEight.com datasets: https://data.fivethirtyeight.com/
- Kaggle datasets: https://www.kaggle.com/datasets
-
https://acatlin.github.io/SPRING2020TIDYVERSE/forcats_makes_plots_better.html - how to use capabilities of forcats package to improve your plots! Andy Catlin and Ait Elmouden Abdellah
-
http://rpubs.com/christianthieme/589762 (RMD file: PurrrMap-LoseTheForLoop.rmd) - Using purrr::map() and purrr::pmap() instead of for loops in R - By Christian Thieme
dbplyr is great and simple to use backend solution for working with data that is stored on a SQL server. dbplyr follows the same grammar and formatting as dplyr, but translates R code into a SQL query to directly access the SQL database.
From the forcats package, this function allows you to quickly group levels of a categorical variable if they are above/below a certain count or proportion.
Author: Ken Popkin This vignette loads the Lubridate package and performs a few date transformations. Ken Popkin
A vignette describing the prepration of an online table for ggplot and facet_wrap Author: Thomas Hill
Instructions:
Please follow along with the steps I took to prepare date for ggplot exploration. I based this off of a previous project and added some annotations about the functions I called.
Using titanic
dataset from Kaggle
, clean and manipulate data using various functions of tidyvesrse package. (Subhalaxmi Rout)
An Example using TidyVerse packages - ggplot2 and dplyr and using tesla-stock-data-from-2010-to-2020 Data set from Kaggle
=======
Using titanic
dataset from Kaggle
, clean and manipulate data using various functions of tidyvesrse package. (Subhalaxmi Rout)
An Example using TidyVerse packages - ggplot2 and dplyr and using tesla-stock-data-from-2010-to-2020 Data set from Kaggle
Author: Vinayak Kamath
ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
=======
ggplot2 geom_line to show the stock price movement over the years for Tesla shares.
dplyr filter helps in filtering of data based on one or more conditions.
=======
dplyr filter helps in filtering of data based on one or more conditions.
dplyr filter to show the days when the stock price for Tesla moved by over 15% (profit or loss) in one day.
dplyr group by and summarise helps in getting aggregated data from the given data set for one or more columns.
dplyr group by and summarise to show the yearly minimum and maximum stock price close and arranging it in descending order of movement in a year.
dplyr group by and summarise helps in getting aggregated data from the given data set for one or more columns.
dplyr group by and summarise to show the yearly minimum and maximum stock price close and arranging it in descending order of movement in a year.
Author: Ken Popkin Extended this vignette to also include how to create a barplot using ggplot2
Gabriel Abreu
Abdellah Ait Elmouden
Author: Neil Shah Extended this vignette to also include how to deal with missing values
Link: https://github.com/acatlin/SPRING2020TIDYVERSE/blob/master/tidyverse-SubhalaxmiRoutNeilShah.Rmd
Abdellah Ait Elmouden
Author: Samuel Bellows
calling basic geometries, calling aes and when arguments should go inside/outside aes, facet grid, and basic plot customization. Also includes some use of dplyr and lubridate.
Nilsa Bermudez - Tidyverse recipe
Vinayak Kamath
Showcase a few more styles and customizations in GGPlot. Themes, guides, manual color fills, and labels.
Showcase a few more styles and customizations in GGPlot. Themes, guides, manual color fills, and labels.
Author: Bonnie Cooper This vignette uses global shark attack data to demonstrate several purrr functions for filtering lists
=======
Author: Adam Gersowitz
This vignette looks at varisous ways to display visual data using ggplot2 and specifically geom_polygon.
This can be very useful when trying to convey information that has a physical correlation to people who aren't data scientists. For example population data on a map based on county or a blueprint of a building that shows where certain problems are occurring with certain tenants.
Author: David Blumenstiel
Group_by provides an intuitive way to look at one's data. As the name implies, this function will interperet data based on "groups". See the vignette for more information.
Author: Amit Kapoor Used the dataset https://fivethirtyeight.com/features/which-state-has-the-worst-drivers/
Author: Manolis Manoli looking through hotel booking data using some dplyr functions and a bit of lubridate
Author: Philip Tanofsky
Vignette for the popular stringr functions from the Tidyverse packages. The stringr library provides a suite of commonly used string manipulation functions to assist in data cleaning and data preparation tasks on a vector of 10 tweets.
- https://rpubs.com/dpong8988/591906 by Dennis Pong
- explain the nuances between rename and select
- explain what's top_frac and how is it different from top_n
- explain how to use an alternative to summarise/summarize's n() -- tally
- explain an even simpler way to do tally by introducing count
=======
-
Create an Example Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. (25 points)
-
Extend an Existing Example Using one of your classmate’s examples (as created above), extend his or her example with additional annotated code. (15 points)
You should fork the provided repository and then clone it locally if you wish. Once you have code to submit, you should make a pull request on the shared repository. Minimally, you should submit .Rmd
files. Ideally, you should also submit an .md
file and update the README.md file with your example.
If you are going to use RStudio as your version control software, make sure to add *.Rproj
and .gitignore
to your .gitignore file before you make any commits. Otherwise you run the risk of trying to push that to the master repository in a pull request.
After you’ve read each part of the assignment, please submit your GitHub handle name in the submission link provided in the Major Assignments folder! This will let your instructor know that your work is ready to be graded.
You should complete both parts of the assignment and make your submissions on the schedule specified in our course syllabus
- GitHub repository: https://github.com/acatlin/SPRING2020TIDYVERSE
- FiveThirtyEight.com datasets: https://data.fivethirtyeight.com/
- Kaggle datasets: https://www.kaggle.com/datasets
-
https://acatlin.github.io/SPRING2020TIDYVERSE/forcats_makes_plots_better.html - how to use capabilities of forcats package to improve your plots! Andy Catlin and Ait Elmouden Abdellah
-
http://rpubs.com/christianthieme/589762 (RMD file: PurrrMap-LoseTheForLoop.rmd) - Using purrr::map() and purrr::pmap() instead of for loops in R - By Christian Thieme
dbplyr is great and simple to use backend solution for working with data that is stored on a SQL server. dbplyr follows the same grammar and formatting as dplyr, but translates R code into a SQL query to directly access the SQL database.
- http://rpubs.com/christianthieme/589762 (RMD file: PurrrMap-LoseTheForLoop.rmd) - Using purrr::map() and purrr::pmap() instead of for loops in R - By Christian Thieme
dbplyr is great and simple to use backend solution for working with data that is stored on a SQL server. dbplyr follows the same grammar and formatting as dplyr, but translates R code into a SQL query to directly access the SQL database.
From the forcats package, this function allows you to quickly group levels of a categorical variable if they are above/below a certain count or proportion.
Author: Ken Popkin This vignette loads the Lubridate package and performs a few date transformations. Ken Popkin
Author: Devin Teran
A vignette describing the progressive use of as_tibble(), dplyr functions select, arrange, filter and a gpplot to visualizae data. Author: Neil Shah
A vignette describing the prepration of an online table for ggplot and facet_wrap Author: Thomas Hill
Instructions:
Please follow along with the steps I took to prepare date for ggplot exploration. I based this off of a previous project and added some annotations about the functions I called.
Using titanic
dataset from Kaggle
, clean and manipulate data using various functions of tidyvesrse package. (Subhalaxmi Rout)
An Example using TidyVerse packages - ggplot2 and dplyr and using tesla-stock-data-from-2010-to-2020 Data set from Kaggle
Author: Vinayak Kamath
ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
ggplot2 geom_line to show the stock price movement over the years for Tesla shares.
dplyr filter helps in filtering of data based on one or more conditions.
dplyr filter to show the days when the stock price for Tesla moved by over 15% (profit or loss) in one day.
dplyr group by and summarise helps in getting aggregated data from the given data set for one or more columns.
dplyr group by and summarise to show the yearly minimum and maximum stock price close and arranging it in descending order of movement in a year.
Author: Ken Popkin Extended this vignette to also include how to create a barplot using ggplot2
Gabriel Abreu
Abdellah Ait Elmouden
Author: Samuel Bellows
calling basic geometries, calling aes and when arguments should go inside/outside aes, facet grid, and basic plot customization. Also includes some use of dplyr and lubridate.
Nilsa Bermudez
Vinayak Kamath
Showcase a few more styles and customizations in GGPlot. Themes, guides, manual color fills, and labels.
=======
Vinayak Kamath
Showcase a few more styles and customizations in GGPlot. Themes, guides, manual color fills, and labels.
Author: Adam Gersowitz
This vignette looks at varisous ways to display visual data using ggplot2 and specifically geom_polygon.
This can be very useful when trying to convey information that has a physical correlation to people who aren't data scientists. For example population data on a map based on county or a blueprint of a building that shows where certain problems are occurring with certain tenants.
Author: Michael Munguia
This vignette showcases some of the text manipulation capacities of the stringr library using the context of restructuring the names of America's mayors.
-
Create an Example Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. (25 points)
-
Extend an Existing Example Using one of your classmate’s examples (as created above), extend his or her example with additional annotated code. (15 points)
You should fork the provided repository and then clone it locally if you wish. Once you have code to submit, you should make a pull request on the shared repository. Minimally, you should submit .Rmd
files. Ideally, you should also submit an .md
file and update the README.md file with your example.
If you are going to use RStudio as your version control software, make sure to add *.Rproj
and .gitignore
to your .gitignore file before you make any commits. Otherwise you run the risk of trying to push that to the master repository in a pull request.
After you’ve read each part of the assignment, please submit your GitHub handle name in the submission link provided in the Major Assignments folder! This will let your instructor know that your work is ready to be graded.
You should complete both parts of the assignment and make your submissions on the schedule specified in our ourse syllabus
- GitHub repository: https://github.com/acatlin/SPRING2020TIDYVERSE
- FiveThirtyEight.com datasets: https://data.fivethirtyeight.com/
- Kaggle datasets: https://www.kaggle.com/datasets
-
https://acatlin.github.io/SPRING2020TIDYVERSE/forcats_makes_plots_better.html - how to use capabilities of forcats package to improve your plots! Andy Catlin and Ait Elmouden Abdellah
-
http://rpubs.com/christianthieme/589762 (RMD file: PurrrMap-LoseTheForLoop.rmd) - Using purrr::map() and purrr::pmap() instead of for loops in R - By Christian Thieme
- Extended purrr (RMD filename: same as above.) https://rpubs.com/dpong8988/602410 - by Dennis Pong
- 3 tutorials and 1 important concept on anonymous function
-
- purrr::map_df()
-
- purrr:keep()
-
- purrr::discard()
-
=======
- http://rpubs.com/christianthieme/589762 (RMD file: PurrrMap-LoseTheForLoop.rmd) - Using purrr::map() and purrr::pmap() instead of for loops in R - By Christian Thieme
dbplyr is great and simple to use backend solution for working with data that is stored on a SQL server. dbplyr follows the same grammar and formatting as dplyr, but translates R code into a SQL query to directly access the SQL database.
- http://rpubs.com/christianthieme/589762 (RMD file: PurrrMap-LoseTheForLoop.rmd) - Using purrr::map() and purrr::pmap() instead of for loops in R - By Christian Thieme
dbplyr is great and simple to use backend solution for working with data that is stored on a SQL server. dbplyr follows the same grammar and formatting as dplyr, but translates R code into a SQL query to directly access the SQL database.
From the forcats package, this function allows you to quickly group levels of a categorical variable if they are above/below a certain count or proportion.
Author: Ken Popkin This vignette loads the Lubridate package and performs a few date transformations. Ken Popkin
A vignette describing the prepration of an online table for ggplot and facet_wrap Author: Thomas Hill
Instructions:
Please follow along with the steps I took to prepare date for ggplot exploration. I based this off of a previous project and added some annotations about the functions I called.
Using titanic
dataset from Kaggle
, clean and manipulate data using various functions of tidyvesrse package. (Subhalaxmi Rout)
An Example using TidyVerse packages - ggplot2 and dplyr and using tesla-stock-data-from-2010-to-2020 Data set from Kaggle
=======
Using titanic
dataset from Kaggle
, clean and manipulate data using various functions of tidyvesrse package. (Subhalaxmi Rout)
An Example using TidyVerse packages - ggplot2 and dplyr and using tesla-stock-data-from-2010-to-2020 Data set from Kaggle
Author: Vinayak Kamath
ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
=======
ggplot2 geom_line to show the stock price movement over the years for Tesla shares.
dplyr filter helps in filtering of data based on one or more conditions.
=======
dplyr filter helps in filtering of data based on one or more conditions.
dplyr filter to show the days when the stock price for Tesla moved by over 15% (profit or loss) in one day.
dplyr group by and summarise helps in getting aggregated data from the given data set for one or more columns.
dplyr group by and summarise to show the yearly minimum and maximum stock price close and arranging it in descending order of movement in a year.
dplyr group by and summarise helps in getting aggregated data from the given data set for one or more columns.
dplyr group by and summarise to show the yearly minimum and maximum stock price close and arranging it in descending order of movement in a year.
Author: Ken Popkin Extended this vignette to also include how to create a barplot using ggplot2
Gabriel Abreu
Abdellah Ait Elmouden
Author: Neil Shah Extended this vignette to also include how to deal with missing values
Abdellah Ait Elmouden
Author: Samuel Bellows
calling basic geometries, calling aes and when arguments should go inside/outside aes, facet grid, and basic plot customization. Also includes some use of dplyr and lubridate.
Nilsa Bermudez - Tidyverse recipe
Vinayak Kamath
Showcase a few more styles and customizations in GGPlot. Themes, guides, manual color fills, and labels.
Author: Bonnie Cooper This vignette uses global shark attack data to demonstrate several purrr functions for filtering lists
B. Cooper extended Nilsa Bermusez' TidyverseRecipe. Added some text and a parallel reshaping of the Bob Ross dataset with the pivot_longer() function
=======
Author: Adam Gersowitz
This vignette looks at varisous ways to display visual data using ggplot2 and specifically geom_polygon.
This can be very useful when trying to convey information that has a physical correlation to people who aren't data scientists. For example population data on a map based on county or a blueprint of a building that shows where certain problems are occurring with certain tenants.
Create
This notebook goes through the use of map, map2, and pmap in the tidyverse purr package. We will start with the use of tibble, which is also contained in the purr package. We will use this function to create a list of numbers 1-26 to test the use of the map functions.
Extend
I created and extend assignment of the ggplo2 geom function by adding the functionality of plotly. A common open source package for easy to use interactivity with plots.
Author: David Blumenstiel
Group_by provides an intuitive way to look at one's data. As the name implies, this function will interperet data based on "groups". See the vignette for more information.
Author: Amit Kapoor Used the dataset https://fivethirtyeight.com/features/which-state-has-the-worst-drivers/
Author: Manolis Manoli looking through hotel booking data using some dplyr functions and a bit of lubridate
Author: Philip Tanofsky
Vignette for the popular stringr functions from the Tidyverse packages. The stringr library provides a suite of commonly used string manipulation functions to assist in data cleaning and data preparation tasks on a vector of 10 tweets.
Author: Patrick Maloney Extended by Philip Tanofsky
- https://rpubs.com/dpong8988/591906 by Dennis Pong
- explain the nuances between rename and select
- explain what's top_frac and how is it different from top_n
- explain how to use an alternative to summarise/summarize's n() -- tally
- explain an even simpler way to do tally by introducing count
The purpose of this Vignette is to demonstrate how some of tidyverse packages can be used to explore and manipulate a dataset in R. A dataset was selected from the fivethirtyeight Package. the R markdown file can be accessed from here
Using dplyr: slice to Choose rows by their ordinal position in the tbl. Grouped tbls use the ordinal position within the group.
======= The purpose of this Vignette is to demonstrate how some of tidyverse packages can be used to explore and manipulate a dataset in R. A dataset was selected from the fivethirtyeight Package. the R markdown file can be accessed from here
Author: Subhalaxmi Rout Extended tidyverse assignment from Manolis Manoli’s tidyverse assignment. Assignment shows examples of stringr, tidyr, tibble and ggplot.
Using dplyr: slice to Choose rows by their ordinal position in the tbl. Grouped tbls use the ordinal position within the group.
=======
=======